This is correct. Deepseek has made algorithmic advancements and these are real. If things were status quo there would be reduced demand for Nvidia’s super expensive Blackwell chips. They would all rush to AMD or Google’s TPU or Amazon’s Tranium as it suffices the need today.
However things are NOT status quo. Everyone needs more and more ad more. Whoever blinks first in this spending war loses and it is not a small loss but decimation.
Here is the New York Times story written by Kevin Roosevelt, a San Francisco based writer who has been at this for over a decade, and who has produced documentaries, podcasts, and books on all things technological. It is a gift article, available everyone. It address many of the issues brought up in this thread.
I’ll except a couple of paragraphs:
Some industry watchers initially reacted to DeepSeek’s breakthrough with disbelief. Surely, they thought, DeepSeek had cheated to achieve R1’s results, or fudged their numbers to make their model look more impressive than it was. Maybe the Chinese government was promoting propaganda to undermine the narrative of American A.I. dominance. Maybe DeepSeek was [hiding a stash of illicit Nvidia H100 chips], banned under U.S. export controls, and lying about it. Maybe R1 was actually just a clever re-skinning of American A.I. models that didn’t represent much in the way of real progress.
Eventually, as more people dug into the details of DeepSeek-R1 — which, unlike most leading A.I. models, was released as open-source software, allowing outsiders to examine its inner workings more closely — their skepticism morphed into worry.
But I do think that DeepSeek’s R1 breakthrough was real. Based on conversations I’ve had with industry insiders, and a week’s worth of experts poking around and testing the paper’s findings for themselves, it appears to be throwing into question several major assumptions the American tech industry has been making.
The first is the assumption that in order to build cutting-edge A.I. models, you need to spend huge amounts of money on powerful chips and data centers.
There are other, more technical reasons that everyone in Silicon Valley is paying attention to DeepSeek. In the research paper, the company reveals some details about how R1 was actually built, which include some cutting-edge techniques in model distillation. (Basically, that means compressing big A.I. models down into smaller ones, making them cheaper to run without losing much in the way of performance.)
DeepSeek also included details that [suggested] that it had not been as hard as previously thought to convert a “vanilla” A.I. language model into a more sophisticated reasoning model, by applying a technique known as reinforcement learning on top of it. (Don’t worry if these terms go over your head — what matters is that methods for improving A.I. systems that were previously closely guarded by American tech companies are now out there on the web, free for anyone to take and replicate.)
I am guardedly trustful of this report, and assume we won’t find ourselves in a Milli Vanilli moment a few weeks from now, given that several mainstream and science reporters have now weighed in. If true, this will be a watershed moment for AI, and while there will be some for whom the very very best at the tippy tippy top will be required, for most it will be a McDonald’s burger is “good enough” if it’s 10 times cheaper.
(Seriously, does Facebook really need the same level of precision as the Pentagon? Or the call center from Dick’s Sporting Goods?)
I see major economic retrenching here (assuming truth) as well as geopolitical upheaval, given the time and energy the US/Silicon Valley has put into trying to corral this development for itself.
I think the AI investment thesis is not dead but is now significantly different. With DeepSink I believe a big chunk of the commercial applications of AI will no longer need top of the line NVIDIA chips nor huge power consumptions.
The sweet spot for chip manufacturing may now be in less expensive and more easily manufactured 2nd tier chips and data centers may no longer be such a huge drain on grid power or need dedicated nuclear plants.
I’m just skeptical of any Chinese company/government announcement. I don’t dismiss it outright, but I comes with a truck load of salt.
My cynical side asks, has anyone looked into someone placing a huge short sale option on NVIDIA, etc., awhile back. Kind of like someone had placed a short sale on airlines before 9/11.
There is also the side where Data is being used up. Let’s say China wants more data from more sources. What better way to do it than put out an LLM that is great and easy to use for free. So everyone in the whole world can use it. Just how much data would that give them?
Last year would have been negative for market and economy, given the housing cycle. But we had a strong market, because of AI investments.
Now, if we don’t need to spend all those 100’s of billions, just the top players alone comes to over $ 500 B that has an economic impact. It is not just data center build, but electricity, transmission, etc all the second order and 3rd order activities will slow down.
I am not saying it is immediately going to grind to halt, but it is not just NVDA but lot of utility companies dropped 15%, 20%. That’s not how utility companies trade.
I don’t believe that will happen. The trade was built upon the best AI companies. Deepseek looks like it will be lower tier and the best like OpenAi and Meta and Google will still be competing for the top.
The reason I state this is because Deepseek was trained on public LLM’s so while it can be good enough it will never be the best of the best. So now we have two tiers of AI. One Tier for companies that need the best and the other tier for you and I.
It was an order of x4 to x8 times more expensive than claimed if you include the prior versions created to get this current product.
The insight is in the compression of the process holding up. Or was it?
OpenAI offers a few flavors of GPT. They knew it was different quality versions all along.
I do not trust the press to assess things well. This is a complex market. I am not talking NDVA movements or AI. The values throughout the market are risky.
DeepSeek is far better model than LLama and at par with OpenAI ChatGPT Pro. In fact, when it comes to non-english language the model beats even ChatGPT Pro.
The question is can they continue to outperform? Will SVC with their vast resource pool (money, and people) can catchup? After all they have published their model so, what can you take and run with it?
From my friends at Meta, they have established war-room, like production outage P1 war-room, to pour into R1 and see what can they take it. While LLama is open-source, it is also at the biggest risk of developers dumbing it en masse.
Here is the problem. It was trained on open source models and is very susceptible to hallucination’s. Even more than other models. So it will never be the best but maybe good enough for every day use for those who do not need the best.
Tom Bilyeu explains DeepSeek process in the first 8 or 10 minutes.
If I understand correctly, he says Chatgpt was trained (at great expense, on latest GPUs)… And DeepSeek accessed the Chatgpt results over and over, thousands of times, until DeepSeek results matched Chatgpt.
Ie DeepSeek results are a derivative of Chatgpt.
Am I understanding that correctly?
Is Bilyeu correct?
Bilyeu goes on to say that the innovator first movers (Open AI, Meta, xAI, etc) are spending the big bucks to train models and in the future, the DeepSeek process will be used to mine those expensive models and generate low cost results.
Economics. The big models will spend the $ to produce the product.
DeepSeek models will parasitize the original and “suck the value”.
I’m not seeing DeepSeek explained this way by anyone else.
That is how I understand it too. That is why I am saying Deepseek can not be better but good enough. Now other companies will do the same thing so it is going to be commoditized.
I am not putting down what they did. I think it was very smart. But it can’t be the best because it will always train on older models but it could be good enough. Except for the hallucinations. Maybe that will get better?