DeepSeek model impact on AI hardware companies

It runs a slimmed down model on home PCs. Which you can also run older and slimmed down llama models on home PCs. Its better than those as would be expected by a newer model. But the real gains from the model are using the full 671B and most home computers can not do that. Dave Plummer is using an 8,000 dollar graphics card in an expensive computer to get 4 tokens a second (A word on average is 1.3 tokens) Which is about $9.45 per million tokens in electrical utilization.

Utilization costs are dropping, but the best way to get the highest quality for the cheapest is still on equipment that is dedicated to AI. R1 shines because it allows AI to solve problems that were not solvable before. So its opening up use cases and lowering the price which allows more people to get into it. Which in my opinion will drive more demand for Nvidia chips because they are the cheapest in total cost of operations.

Drew

16 Likes

This is interesting but given that DeepSeek is free to use, what’s the advantage of running the 7B model locally versus the 671B model within their browser or phone app?

2 Likes

Privacy. The app sends data to their severs in China:

13 Likes

Thanks! Make sense. That’s definitely a concern. Then again, I’m also concerned that even downloading from Ollama there could be hidden vulnerabilities in the model that expose my PC.

2 Likes

Good explanations and examples.

6 Likes

I noticed that Raspberry Pi (RPI.L) was up 5% on January 27th and surprisingly did not sell off when almost all the semi-conductor sold off huge that day. Usually the market doesn’t have that nuanced a take on these developments, although it is also trading on a different exchange. It seems beneficial to their business that decent AI models can be loaded onto their chips, even if it runs at a slow rate. That may open up a lot of new use cases with their chips.

It is going to be an interesting season of earnings because we haven’t heard thoughts yet from the executives on how the AI model advancements will impact their businesses.

15 Likes

This is one of the best articles on Deepseek’s impact I have read so I thought I would share. This is by the CEO of Anthropic. It says it is about export controls but he explains everything else first.
Dario Amodei — On DeepSeek and Export Controls

15 Likes

Speaking as a professional developer who actively uses AI tools for work, I can confidently say that Claude 3.5 Sonnet from Anthropic remains the most capable AI for coding tasks. Which is why I hold Dario’s opinions in high regard.
While I’ve tested DeepSeek’s models, they don’t match Sonnet’s capabilities in my experience. This sentiment is also shared among my developer peers.
Given this conviction in Anthropic’s (and to a lesser extent OpenAI’s) technology, I actually increased my positions in ALAB and NBIS during the recent market dip. I’m already well-positioned in NVDA, so I didn’t add there.
The color that Dario added regarding costs is really valuable. While DeepSeek claims their model cost $6M to train, this isn’t the massive cost disruption some make it out to be. The cost reduction is roughly in line with the industry’s natural progression. And even if there was a breakthrough there, it wouldn’t mean the AI-race would slow down and less GPUs would be needed.

52 Likes

Why wouldn’t it? IF there was a breakthrough there, would VCs not pause on spending to fund OpenAI at billions? Perhaps slow the spend?

It’s not obvious that spend would NOT slow. Perhaps not this quarter but over several years.

4 Likes

The AI arena is hyper-competitive. Even if training costs drop, being first to market or first to a key breakthrough remains “priceless” - as Jensen keeps reminding us. It’s effectively a “winner-takes-all” or “winner-takes-most” scenario, so firms and governments won’t suddenly pull back on spending. They’ll double down to stay ahead of rivals.
So AI use cases that were previously too expensive or speculative suddenly become viable - such as domain-specific large models, AI agents, and video generation. More efficient training and inference opens new frontiers rather than causing a slowdown.
If we look at the breakthroughs behind ChatGPT 3.5 or 4; they led to massive investment in AI. Why would a new breakthrough now lead to cuts in spending?

I guess my argument is that a cost breakthrough simply means that budgets go further. :smile:

24 Likes

I’m generally not a big fan of this podcast, but I have to admit this is actually a good discussion of what’s going on with DeepSeek from knowledgeable people (time cued to start at 15:38 in):

Basically:
• $6 million for the final training run could be true. But don’t compare that to the data center purchase and setup costs plus multiple development runs. It’s like $6M to tens of millions, not a $billion.
• Deepseek has access to 50K Nvidia GPUs that were all probably used during development. Not unlikely they have more chips they don’t want to talk about.
• Deepseek team is super smart and very technical, and their writings on what they did using PTK are impressive
• Consensus is that instead of China being 6-12 months behind the US, it’s now only 3-5 months behind.
• Not unlikely DeepSeek did some OpenAI O1 “distillation” - which breaks the terms of use for OpenAi. This is calling the O1 API over and over to gain knowledge and feed output to train R1.
• Kind of a comeuppance for OpenAI, which is being sued for stealing data (eg NY Times) to train, to now have its data stolen for training.

17 Likes

And msft and openai are reviewing possibility that deepseek called the openai api.

If openai outputs were material to build deepseek, then the cost to build deepseek equals
(cost to build open ai)* + (deepseek direct cost to build)

which is, of course, a large number.

*or (just to attempt thoroughness) parts of openai that are material to deepseek (even if all of the exact queries and outputs were known, I doubt this could be calculated or somehow separated from total cost of open ai, but who knows) (and maybe also include cost to build NYT content that was used in openai)

4 Likes

An insightful philosophical article on DeepSeek and the AI movement.

Gray

Why DeepSeek Is Bullish for the World - Mauldin Economics

4 Likes

This is a little anecotal but… The company I work for has about 100,000 employees. They just sent an email stating that they are blocking Deepseek due to security concerns after studying . I imagine this is probably a pattern in most large companies. They haven’t blocked other products from Anthropic or Open AI.

24 Likes

They should tap into one of the hundreds of instances running on US hardware owned by US companies

2 Likes