Deepseek, some random thoughts

Yes, that’s why all the buzz.

We’ll have to agree to disagree on this which does not mean it will not be used. Curating data already does some of this. My point is that it’s not a good long term paradigm.

AI stocks freaking out:

The Captain

1 Like

Already AI companies have announced plans to adopt!!!

1 Like

I wonder if we are talking about the same thing. Of the four items listed I certainly agree with the first two which can be adopted immediately. What I don’t think is a long term paradigm is curating data with expert systems. It has flaws similar to using simulated data.

The Captain

It is not curating data. Let me try. When you are training on a vast set of data, that is in Trillions of tokens. Now most of these models keep all parameters on. DeepSeek architecture parameters size is 671 B but only activates 37 B at once. Instead of having a very big brain, they have experts called when required. On your comment “bigger the brain, it is better”, their overall brain (knowledge, token size) is bigger, but not online. This is not curating data. It is optimizing the memory structure, reducing the cost of inference.

1 Like

This is the part I object to.

The Captain

Today most databases, applications work that way. They don’t store all the data in memory, and call them when required.

Right but I am waiting till the buzz dies down to see if it is true.

How does that relate to experts called when required? Who are these experts?

The Captain

A software improvement in neural net model architecture is another, and I would guess very, very important way to improve. This is I’m sure a very active area of r&d for all companies and really the only area forward for a small company because a small company by definition cannot afford piles of expensive hardware. Instead of let’s just throw as much data and hardware at our model, another approach is let’s make our model smarter with a more efficient architecture.

Sub models, each specializing in some element of the task.

1 Like

It is efficiently done as an after-thought. It is not a new app. It is not a killer app.

It is not new hardware. It is on old hardware.

Claims that NDVA won’t need to produce new hardware make no sense. The current hardware and much of the old hardware support all flavors of the LLM.

The reason the Naz is down is Washington DC.

Kiss your behind good bye if you are long this market.

Stop thinking you have facts when you read the news.

Didn’t you say that same thing over a year ago. This time is different?

Yea of little faith.

No I said we would have a down market a year ago. We did and then the market recovered.

This is the fire and brimstone talk. Big difference.

But hey I do not care about your money.

Oh, your definition of a down market and mine are different. I was up 68 percent but maybe in an up market I would have been up 150 percent?

1 Like

Perhaps I was not clear. Smart folks like Andrej Karpathy, Aravind Srinivas, Yann LeCun, … I can go on, these folks have benchmarked and found the model is as good, if not better than Gemini 2.0, or CharGPT pro. It is miles better than Meta’s LLAMA. Don’t be dismissive of this technology breakthrough.

I am not sure what are the long-term impact is. Remember this has now wiped more than $1 T market cap. Lot of technical levels are broken. We have earnings coming… interesting days. I wish I had closed my $NVDA instead of the covered call. Trying to squeeze few dollars seems stupid now… :smile:

3 Likes

The first thought that came to my mind when I heard about this was that we have been building larger and larger neural nets for these models, but maybe that is not how our brains are. Likely our brains are not one gigantic neural net, but many smaller ones, each for specific tasks, with an ability to call upon them as required for a task. And maybe DeepSeek figured that out. Pure speculation.

3 Likes

Likely speculation, brains have specific parts for specific jobs which allows for multitasking.

The Captain

1 Like

I bought NVDA leaps today.

The idea that you have several small brains does not work in most real world scenarios. There are too many feedback loops to inject logic and route them.

1 Like

You should read more about DeepSeek. This description is incorrect.

2 Likes

Also, using 8 bit instead of 32 bit is also a red flag. It is lossy. Maybe this is tolerable in certain use cases but in mission critical this is no go.