DeepSeek model impact on AI hardware companies

Thought it would be useful to have a thread on the DeepSeek R1 model which is having a big downwards impact on AI hardware companies such as Nvidia, Astera Labs, and Credo.

The DeepSeek model is Chinese made, open source, and costs less than OpenAI or Meta’s Llama models. They have a phone app and web app. One of the most fascinating things about the app is that it shows the chain of thought reasoning for the query as it goes. Here’s a query to DeepSeek asking about Raspberry Pi. Notice how it uses phrases such as “Wait,”, “I remember”, or “Maybe?”


The DeepSeek R1 model performs strongly against OpenAI’s latest public model o1, in most cases being on par with o1,


Interestingly, the R1 model was not released this weekend, but it seems the market woke up today to threat this model may mean for companies like Meta which is producing Llama. This article talks about how they are scrambling now at Meta to unravel and understand how the R1 model is outperforming them at a lower cost,

Here’s what a supposed insider at Meta said about it,


Overall this does seem like a threat to the model makers and makes me wonder if companies like Meta may begin to slow down hardware purchases. Otherwise, they may be able to able to copy the insights from DeepSeek and then apply even more compute to it. Will be interesting to see how this plays out.

The market reaction seems pretty swift for downgrading these companies. It could present some opportunities as well if the prices really crash on these AI hardware makers. Would be interested to hear other board member’s take on the situation here.

44 Likes

Decreased price of LLM training will itself increase utilization - Jevons’ Paradox

Jevons’ Paradox, named after the English economist William Stanley Jevons, describes a situation where technological progress increases the efficiency of a resource’s use (thereby decreasing the unit cost to utilize that resource), but this leads to an increase in the overall consumption of that resource, rather than a decrease. Here’s how it works in more detail:

  • Historical Context: Jevons observed this paradox in the context of coal usage in England during the 19th century. He noted that as steam engine efficiency improved (making coal use more efficient), the demand for coal didn’t decrease; it actually increased because the lower cost of energy made it more accessible and desirable for wider use.
  • Core Concept: When the efficiency of resource usage goes up, the effective price per unit of service from that resource goes down. This reduction in cost can lead to an increase in demand because:
    • The resource becomes more affordable for existing uses.
    • It enables new uses that were previously not economically viable.
    • It might encourage increased usage simply because the cost barrier is lower.
  • Examples:
    • Energy Efficiency: More efficient light bulbs or appliances might reduce energy use per unit, but if the cost of using these devices falls, people might use more lighting or run appliances longer, thus increasing overall energy consumption.
    • Water Use: Efficient irrigation systems might reduce water use per crop, but if the cost of watering goes down, farmers might irrigate more land or grow crops that require more water.
  • Implications:
    • Environmental Policy: This paradox has significant implications for environmental sustainability strategies. Simply improving efficiency might not lead to less resource depletion if not coupled with policies that manage or cap total usage.
    • Economic Policy: It suggests that efficiency gains in production might lead to increased consumption unless there are mechanisms in place to curb demand or incentivize conservation.
  • Counteractions: To mitigate Jevons’ Paradox, strategies might include:
    • Regulation: Setting limits on total resource use.
    • Pricing Mechanisms: Increasing the price of the resource through taxes or cap-and-trade systems to offset efficiency gains.
    • Behavioral Changes: Encouraging societal shifts towards less consumption through education or cultural change.

Jevons’ Paradox thus illustrates a counterintuitive outcome where attempts to use less of a resource through efficiency can sometimes result in using more of it. This concept is crucial in discussions about sustainability, resource management, and economic policy.

The following update of expert opinion, regarding the impact of Deepseek specifically, is well summarized below

!(https://us1.discourse-cdn.com/motleyfoolfree/original/3X/e/0/e04b5422f93b1b1ff64f67fa57b0c86a7b790fe3.jpeg “DeepSeek R1 - The Chinese AI “Side Project” That Shocked the Entire Industry!”)

DeepSeek R1 - The Chinese AI “Side Project” That Shocked the Entire Industry!

Best

Jason

50 Likes

Yes, these are suffering much today with CRDO down nearly 25% on the day so far and ALAB down over 20%.
Should we be concerned for these companies in the light of Deep Seek? Or is the market simply over-reacting as it so often does?

Jonathan

6 Likes

Adding a thought, everything I have ready today says something like, “they tell us it is cheaper”…so I am taking that with a huge grain of salt right now. I did some nibbling at Nvidia, Serve, and Broadcom with the drops today.

8 Likes

I also find it…interesting…that in an AI arms race we’re going to blindly trust what China is telling us they spent.

Furthermore, my understanding of all of this AI investment is that it’s not just so these companies can develop the best chatbot. It’s to built out the infrastructure for many different uses of AI that might be much more complex than the chatbots, such as robotics, new product development etc. And many things in the future that we haven’t even thought of yet.

So I’m thinking it’s an overreaction, but I’d also like to see big tech reassuring investors.

28 Likes

Smorgasbord1 from other side of TMF:
commenting on Deepseek 5 alarm fire

5 Likes

I found this blog post this morning which provided a great deal more context for why DeepSeek is a concern/relevant. The Short Case for Nvidia Stock | YouTube Transcript Optimizer

I didn’t fully appreciate the role of inference and chain-of-thought in this discussion. This blog did a nice job of setting the table.

18 Likes

Well, this was definitely a big bomb to the data center stocks. DeepSeek has a major disadvantage compared to other LLMs: the context window is too small. For example, if you continue chatting with the model, at some point, the model will fail to take all your previous conversation as input.

The market may have not realized that there was actually already another China-made model, that was claimed to be on par with ChatGPT-4o and has 20 times longer context window than other LLM models. It’s called MiniMax and here’s the paper . This model is also trained with 1/10 of ChatGPT’s GPU cost, similar to DeepSeek.

It’s not good news for Data Center stocks. If other LLM giants follow the tech, data center cost can potentially be significantly cut.

Luffy

19 Likes

Wall Street appears to be getting a couple of things wrong. However, the real impact is yet to be determined and it does look like DeepSeek will have a real impact.

The startup spent just $5.5 million on training DeepSeek V3—a figure that starkly contrasts with the billions typically invested by its competitors.

This conflates training costs with infrastructure purchase/set-up costs. It’s like saying “I drove this Chinese sports car that’s faster and redder than a $250k Ferrari cross-country, but for only $10,000.” The training cost is mostly/basically the cost of time rental for the servers involved. Yeah, that Ferrari costs $250k, but you can rent them for $1800/day, so that 10 day cross-country trip comparison is $18,000. Certainly $10,000 is better than $18,000, but it’s not the orders of magnitude analysts and article authors are writing. (Numbers I’m quoting aren’t scaled to the GPU rental numbers, btw, this is just illustrative).

What do cheaper AI deployments mean? It could mean that the push for 100k servers in a data center goes away. That could mean that the data center power companies (POWL, VRT, etc.) are hurt, and that the advanced data center networking companies (Broadcom, ANET, ALAB) are hurt. But the networking companies have other businesses as well.

For Nvidia, it could mean that instead of a dozen companies buying 100k GPUs there are thousands of companies buying 10k GPUs (or something along those lines). I agree with @WillO2028 ‘s take on Jevons’ Paradox - cheaper AI means more use of AI by more players.

However, what we don’t know is how the DeepSeek team’s use of lesser GPUs translates to a world where not just faster GPUs are available, but those faster GPUs also having better price/compute ratios and better power/compute ratios. Despite the higher price of Blackwell over Hopper, the cost per GPU compute and the power needed for that GPU compute unit are better (cheaper) than Hopper. And since in no way does DeepSeek train on less than thousands of GPUs (they claim 2,048 less-than-H100 GPUs, but that’s not been verified), there would still be an advantage to using Nvidia’s latest and greatest to simplify the data center build out.

OTOH, being able to use lesser GPUs opens up the potential market for companies like AMD, and makes the ASICs being developed by Amazon and Google, etc. potentially more viable. Amazon in particular had been going down a low-cost data center route previously, and now if the DeepSeek software advancements can be applied there, Amazon (and Anthropic) might be in a good place.

Additionally, AI software companies (like PLTR) should be better now, as they can improve their models to make their services less costly for their customers to run.

So, the across-the-board AI blood-bath seems clearly overdone to me, but there’s still some questions to be answered before we know who all the winners and losers are.

51 Likes

There is so much here. I tend to make bullets to think about aspects of an issue, which helps me segment complex topics into scopes and perspectives:

  1. The bottom end of the AI spectrum is not well defined as everyone (that we knew of) was trying to hit homeruns/grandslams. DeepSeek is a paradigm validator, indicating that both the portion of the spectrum of high performance is smaller and that the value oriented segment is possibly better defined now and more attainable by “the herd” that do not need to be FIRST/best/MORE.

  2. The top end of the spectrum has lost no pace, but a significant portion of their breadth is in question. While we are still seeing the bleeding edge of speed and performance be sought after by anyone with a few spare $100s of billions, the TAM for “bleeding edge” has been considerably reduced. AI in your pocket will not contain GraceBlackwell, but it might not EVER need to. Other intermediary use cases now have value opportunities to do more with less. The top of the mountain is still rising from the sea, however the peak is steeper with less room for competition. ASIC Up, Value Up, GPU bleeding edge down, GPU margin down on the whole, GPU margin preserved for the top tier winners.

  3. Infrastructure backlogs for Data Center creation/expansion will need to be evaluated for a more tiered approach to infrastructure. Alternate locations for smaller scale data centers may be MUCH more viable now, hence the swarm of implementations in the market vs mega monolithic only by those who have bleeding edge intentions. CoLocated datacenters and less power hungry installations can be more tightly integrated by a wider set of infrastructure providers and end users.

  4. Cost pressures to seek value are finally here. Margins should be questioned and reevaluated to support viable business cases. If you only had bleeding edge options, your budget was binary. Now it’s a spectrum, with commoditization at the lower levels (and expected limited performance) - fit for purpose.

  5. There is at least one additional bifurcation in AI data science now. This means individuals within this space will be building their world view - and experience in one or two channels (Seek bleeding edge tech to do more with more OR seek value oriented use cases which require only minimums). There will be more professionals available, but their skillsets will be more segmented.

This is all to say the market for AI is getting more mature. The market reaction today is a big indicator of this.

34 Likes

First, I liked your post, you made some really good points.

Well, Nvidia already has a Grace-Blackwell solution for the desktop:

Yeah, not the GB200 you were probably referring to, but even so.

Also, I think it’s worth putting into perspective that even DeepSeek’s claim is that it used 2048 GPUs. I don’t know of any good-enough training model that can be produced in reasonable time on any single GPU. I think the question there becomes performance and power consumption per price AND setup costs and complexity. Setting up 2,000 Blackwell GPUs is probably cheaper overall than 5,000 Hoppers or 7,000 AMDs, etc.

So while the DeepSeek folks used what they had on hand (and there is debate ongoing about whether they tapped into the 10,000 H100 reportedly there but didn’t want to tell the world about), the same techniques should apply to smaller clusters of the faster GPUs. And while more per GPU, the overall system cost (and complexity!) should go down as Nvidia continues to produce faster chips and systems.

As for the “in your pocket” comment, there might be some advantages to DeepSeek’s approach on the inference side as well, as discussed here:

16 Likes

This short video from Dave’s Garage is a decent watch:

Although he does make the conflation of training costs versus data center setup costs, Dave does say he’s run DeepSeek on a few of his own personal computers (he’s quite the geek, btw) with good performance, so the DeepSeek upside of requiring less inference hardware is confirmed. And he was able to get it to talk about Tiananmen Square on his own hardware, so the open source model appears to not be politically constrained.

15 Likes

From

  • Big Iron to
  • Minis to
  • PCs to
  • Laptops to
  • Smartphones?

The above development could not have happened without a lot of infrastructure

  • Modern computer languages
  • WWW
  • App stores

Yet a lot stays the same, the only way to manage such complexity is to atomize the whole into manageable chunks. DeepSeek does that. Recently Elon Musk managed to break the coherence barrier but with DeepSeek architecture it might not be necessary.

What I find ClickBaity is that China is going to eat America’s AI lunch. The open source model says that everyone can copy it. All technologies have two tribes, providers and users. DeepSeek hurts (some) providers while benefitting users.

LLMs have run out of data. Has Tesla run out of road data? No, it is being created and collected daily by millions of agents, a.k.a. cars. The same with humanoid robots, they will be collecting real world data every minute they are working. Can Tesla use the DeepSeek architecture? Elon will figure it out which brings me to what my subconscious was working on last night.

  • Experts

DeepSeek architecture reminds me of OOP. Every OOP Class is an expert. The big difference is that in OOP the coder guides the flow to the appropriate Class while in DeepSeek it’s the data itself that does the guiding. Fascinating!

The Captain?

20 Likes

Going back to the interview with Dylan Patel, he said the compute is being shifted to the inference side from the training side. Since most of the data centers were being build for the inference side. Looking at the inference side moving to an MoE (Mixture of Experts) is a way to cheapen the compute cost of inference. Since not every single parameter must be computed for every token. This allows for larger number of parameters but not having to compute them for every token. MoE was considered to be brought to the main stream via GPT-4* (Highly speculated that GPT-4 used MoE), Llama 3.1 (Facebook’s open source) is MoE. DeepSeek just produced a superior MoE model than the previous models.

While MoE has reduced the cost of inference its not reduced to a negligible cost. Which means the chips that are the cheapest to run will still be winners with this new paradigm. Nvidia’s CEO says their competitors Total Cost of Operating would still be more expensive than Nvidia even if they gave their product away. So as the world continues to scale into AI it will still be cost conscious and want to go the route that is cheapest which leads back to Nvidia and other AI hardware companies. Sure you can run AI on your home computer but it will be a worse product and slower and more expensive than running it on a data center designed from the ground up on running AI.

Looking to the future for AI, their is a possibility of another type of disruption similar to MoE but its called Memory Layer. Memory Layer is being held back today’s memory. So as memory becomes better it might lead to a transition to Memory Layer and has the potential to create new winners and losers in the AI field.

Drew

16 Likes

As always, Jamin Ball’s analytical message is compelling.

Gray

What a last few days in AI land! DeepSeek came out with R1 and it created huge ripple effects in the tech world, maybe culminating with the DeepSeek app shooting up to #1 in the app store (is this real? The result of bot farms downloading in mass?).

40 Likes

This is the paradigm enhancement.

Investing criteria are (quixotically) now much more complex.

How to factor the reduced breadth of the top margin, top flight competitors? (NVDA structures are still essential, winners space is quite undefined. There are far more consumer grade basic needs applications than Palantir Foundry users. What is the TAM of each? Palantir AIP is on the order of ~$1T

What part of the market is served by PUBLICLY traded companies vs cottage industry private consultants?

From a hardware perspective, this is a growth driving development.

Purchased additional shares in premarket for NVDA

15 Likes

I like to keep things simple, where possible:

  1. I would be absolutely shocked, and do not believe, that this Chinese invention is something that, just for example, the NVDA people had no idea about and that threatens their very business model. Recent interview by the CEO Huang in Wired magazine was very interesting and amazing. My bet: NVDA is very, very smart and very, very good at what they do, and this did not catch them unawares.
  2. Similar analysis for META MSFT GOOG AMZN and others.
  3. There are so many examples of Chinese companies being less than transparent (and oftentimes outright fraudulent) with respect to various financial records, claims, earnings reports and the like . . . that it would, IMO, be insane to just believe anything any Chinese company claims without verifiable (outside verification) proof.
  4. I added NVDA on 1-27-25. Shares and calls.
26 Likes

I agree, however, there is a novel approach out there that will launch an AI assignment tomorrow from a uniquely new platform constructed by a cumulative knowledge base that does not even exist today. Effectively accessing this exploding knowledge base will drive enormous reductions in processing power mandates in the future.

5 Likes

I am of the same belief. I had a small toehold in NBIS for grins, now at a small loss. This morning I doubled up my holding. My reasoning is that most of their data center build is ahead of them in 2025. They have the capital and they have the skills. IF the Deepseek really plays out, NBIS now can buy less hardware and compute more. They are targeted at small to medium clients so I expect the result to lead to more consumption offered at a lower price.

-zane

12 Likes

Just to be clear, DeepSeek R1 runs on home PCs. And it runs well if you’ve got an Nvidia graphics board. Dave Plummer says it runs well on the $250 Jetson Orin Nano, too:

And if you want, you can install and run it on your own PC (even air-gapped):

So, DeepSeek is not only cheaper to train, it’s dirt cheap to run.

8 Likes