Deepseek, some random thoughts

Kingran · January 26, 2025, 7:55pm

Deepseek, the new state of the art reasoning model, upended the silicon valley AI scene. Few young kids, who really wanted to develop a quant trading model for China, eventually ended up developing a new algorithm for LLM. Pretty impressive With that some random thoughts, in no order.

We heard from NVDIA to every HPC how huge clusters (100K+) of GPU’s are required to train the model, and how big is their cluster etc.
- Scale is an economic moat, not an innovation moat, if anything it works other way around, those who are deprived innovate furiously
US government’s attempt at export control etc, trying to slow down China actually failed
- It is time to question the CHIPS act and now STARGATE
- Trying to eliminate few 100 government employees thinking on your ideological ground and making a big fanfare that we are going to save money and yet we are spending 100’s of billions or cumulatively over a trillion and have nothing to show for, when will people question this?
The myth Chinese are copycat and cannot create original, innovative tech is broken
We have seen Elon cutting costs on rockets, India’s Mangalyaan (a space probe to Mars) on shoestring budget, repeatedly shows the old bloated enterprises cannot be justified with “american exceptionalism”.
- This extends to Pentagon and most of the aerospace, defense companies. We need serious innovation and cost cutting here. US certainly doesn’t need to spend so much on defense.

buynholdisdead · January 27, 2025, 12:53am

Have you used it? Or are you just repeating what someone else has said?

Kingran · January 27, 2025, 12:58am

Played with it a bit. Reading the paper on Model. Of course I have listened/ read few folks I consider very smart on AI, that includes academic, professional, VC voices. But these are my own thoughts. For ex: on the moat, it occurred to me, having said/ written that, there is nothing to prove/ disprove that deepseek cannot get even better with a bigger fleet of GPU’s.

buynholdisdead · January 27, 2025, 1:02am

I am on the side of wait and see. I am not so sure they are all that great. If, as you say they can get even better with a bigger fleet of GPU’s. Well how well could they be doing now when X, Meta, and OpenAI already have those bigger fleets.

Kingran · January 27, 2025, 1:17am

Here are some of the differences than the models used by US AI companies, just to be clear, I am cutting and pasting from someone, who is essentially building an open source LLM.

Use 8 bit instead of 32 bit floating point numbers, which gives massive memory savings
Compress the key-value indices which eat up much of the VRAM; they get 93% compression ratios
Do multi-token prediction instead of single-token prediction which effectively doubles inference speed
Mixture of Experts model decomposes a big model into small models that can run on consumer-grade GPUs

The above discusses the key differences that helps deepseek to use lesser memory and compute. Separately, I am paraphrasing my understanding here, could be technically incorrect,

deepseek, can stop and start their training model without completely restarting, i.e., if openai starts their training and realized they hallucination rate or failure rate is higher and need to make some adjustments to their model/ algorithm, they essentially have to restart the training and deepseek doesn’t have to. I don’t know about the model training deeply to understand why, or how deepseek is able to achieve this.

But this is big advantage.

buynholdisdead · January 27, 2025, 1:23am

It could be if true. I am just skeptical that deepseek is better than X,OpenAi, or Meta.

Kingran · January 27, 2025, 1:44am

Why are you skeptical?

When you type a sentence in chatbot, first it needs to be converted to vectors, the way it is done is by converting one word at a time and the next word is added to the previous vector and fed to the encoder, similarly for the decoder. This is done one token (word) at a time. Deepseek for example can use multi-token, bringing in significant improvements.

This is one example. Again, remember they released their product as open-source. That means nothing stops Meta for ex, to take it and further fine-tune it, or adopt parts of their architecture into meta’s Llama, etc.

If you are skeptical that no-name chinese guys can achieve such a success, remember the VC model is always finding 2 or 3 extraordinary talented guys who can disrupt the existing product and deliver 10x improvements. The LLM deviated from this because in order to catch up to Google, the openAI’s of the world build large scale training infrastructure. There is nothing new about few smart individuals upending existing players. If you think they happens to be Chinese and therefore they cannot achieve this, then knowledge is not a monopoly of any region, country, religion, race…

bjurasz · January 27, 2025, 2:51am

If any of this is true about DeepSeek I see this as a bubble bursting moment.

Kingran · January 27, 2025, 3:17am

Few data points.

DeepSeek is the #1 download in App store
Aravind is CEO of perplexity, and he is already committed to use deepseek in their product.
This is the more revealing data point, look for inference cost by existing AI players to come down in the coming week(s).
- I have already posted this in the LL , how $BABA has slashed its price by 85%. When your competitors are competing with you on price, that means they are not providing a superior product than yours. At the least, your product is as good as theirs.

buynholdisdead · January 27, 2025, 3:55am

I am Skeptical because OpenAI has been at this longer with more. If it was that easy to over take them then some other company in Silicon Valley would have done it. After all, you don’t think that only Chinese are hungry to get a head do you?

buynholdisdead · January 27, 2025, 3:56am

I thought we were talking about LLM’s not inferencing.

Kingran · January 27, 2025, 4:12am

I asked " I am specifically looking at your ability to course correct even before you provide me the response question to both DeepSeek and chatGPT to understand how their architecture differs and works… I have pasted both responses in the below link. See for yourself.

Kingran · January 27, 2025, 4:15am

Silicon Valley companies are not constrained by capital or access to compute (GPU’s), so they fight using traditional tools. When you fight against a much bigger, powerful army, what do you do? You use gorilla technic or innovate in how you engage your army. Now, that is the “necessity is the mother of invention”.

buynholdisdead · January 27, 2025, 5:46am

Sounds to me like Chatgpt was doing that a year ago. I am not an expert and the only way we can actually compare the two is to have an expert compare them.

Kingran · January 27, 2025, 6:43am

ChatGPT says…
my ability to dynamically adjust and course-correct during a conversation

What this means is ChatGPT requires further inputs… OTOH, DeepSeek can “course correct” as it generates the reference or during the inference space.

ChatGPT is a watershed moment in AI, So is DeepSeek. We will have many more. US companies Google, META, OpenAI, Anthropic,… all have deep pockets, deep talent. We will see…

Kingran · January 27, 2025, 6:51am

Marc Andreessen of a16z.com says…

captainccs · January 27, 2025, 9:26am

I have no opinion on Deepseek. As a coder the first two items are spot on! When I started, computers were very limited and we were forced to optimize. By the time of the dot.com bust, hardware had become so abundant that George Gilder was saying, “Waste abundance with glee.” Up to neural network AI, most software was algorithmic. Neural network computing is an entirely different paradigm that relies on abundance instead of on boolean (human) logic (algorithms). Put another way, neural networks work like the universe does, everything is huge, made up of the tiniest of tiny.

Item three is multitasking, a good use of scarce resources.

Item four is not a long term solution. AI has to stand on its own two feet. Sorry for the mixed metaphor.

Despite Deepseek, size matters. Intelligence is an emergent property of complex systems we call brains. Bigger brains tend to be mote intelligent than smaller brains.

The Captain

intercst · January 27, 2025, 11:28am

All the Techs are down in pre-market trading.

intercst

Kingran · January 27, 2025, 12:15pm

Actually this is very very sustainable. Currently models are trained on everything, and build one massive system. What DeepSeek did is build many expert systems and these expert systems are called when needed. Instead of 1.7 Trillion token active at once, they have only 37 B token active. You have a huge team, but call the experts only when needed. Also, you can add experts, fine-tune experts. Just to give an example, tomorrow you decided to add genome data, you can easily add without needing to completely retrain the model.

This is not only sustainable, expect this to become standard or widely adopted approach.

Kingran · January 27, 2025, 1:22pm

Topic		Replies	Views
Are we focusing on the right things? Macro Economic Trends and Risks	27	263	February 4, 2025
DeepSeek model impact on AI hardware companies Saul’s Investing Discussions	34	8848	February 8, 2025
Chinese AI Breakthrough Macro Economic Trends and Risks	21	326	January 26, 2025
Multi-millionaire, very early retired MSFT employee explains DeepSeek R1 Macro Economic Trends and Risks	4	139	January 28, 2025
AI worries....no way! Way? Macro Economic Trends and Risks	4	144	January 27, 2025

Deepseek, some random thoughts

Related topics