Long form Interview with Dylan Patel (Semiconductor analysis: Nvidia)

Great discussion:

• 70% of all AI workloads are on Nvidia chips. About 28% on Google (thanks to Google Search and Google Ads, two of the largest money-making AI apps today, along with TikTok and Meta). So if you look at workloads people are purchasing, it’s 98% Nvidia.

• Google buys Nvidia chips for Google Cloud - to rent GPU compute time to customers. Probably because of CUDA.

• Patel says Nvidia is dominant because of a "three-headed dragon:

  1. Software: “Every semiconductor company in the world sucks at software - except for Nvidia.”
  2. Hardware: Nvidia gets to the newest technologies first.
  3. Networking
    As Brad says, multiple competitive moats.
    Patel goes on to point out that the Blackwell racks are huge - 3 tons, and only Nvidia can do it all in-house.

“Building a chip is one thing. But building many chips that connect together, cooling them, networking them…is a whole host of things that other semiconductor companies don’t have the engineers for.”

• Blackwell’s Performance TCO is 5X Hopper.

• “The cost for delivering LLMs is tanking, which is going to induce demand.”

• Nvidia has a lot more software than just CUDA for training. But, CUDA is essential for training, as this is the development stage, and engineers are constantly trying new things and it’s not worth spending time optimizing things themselves. They rely on CUDA/Nvidia being fast/good enough off the bat with their development tools.
But, on the inference side, which is deployment, customers like Microsoft can see benefits to hiring engineers and tuning the models to run on cheaper hardware since those apps will run for 6 months - much longer than a training try.

• Patel believes that companies are upgrading their non-AI data centers in order to gain power to run new GPU installations in those data centers. Essentially, the new CPUs are also more performance per watt and per rack, so upgrading those frees up rack and power for new AI racks and workloads.

• Synthetic data generation is just getting underway and will increase the results of training compared to training on the entire internet today.

• “When you look at The Street’s estimates for capex, they’re all far too low…This whole scale is over narrative falls on its face when you look at what the people who know the best are spending on.”

• Nvidia’s source of capital is a lot different than Cisco’s back in the day. And the private market contribution today is much smaller (accounting for inflation) than it was back in the Dot Com Boom days. Today, the source of the money is cash flows from the most profitable companies in the world.

• GPT4 cost millions of dollars to train, but it’s generation billions of dollars in revenue.

• Consumer is paying 50X more per query now, but they’re getting value out of it because they’re getting things they couldn’t get before at any cost. Example is for code development - spending more is still cheaper than human coders. Gives examples of making $300k/year programmers 20% more efficient, or replacing 100 developers for 75 or 50 - those are “so worth using the most expensive model.”

“The cost for intelligence is so high in society”

• Memory is growing faster than GPU. Nvidia’s highest cost is HBM memory, not TSMC.

• The only reason people buy AMD GPUs is because they have more memory in the package. Patel: “Maybe we can’t design as well as Nvidia, buyt we can put more memory on it… The software isn’t nearly as good, the compute elements aren’t nearly as good but by golly they’ve got more memory bandwidth per dollar.”

• AMD is missing software, they won’t spend the money to build a GPU cluster for themselves to develop software. “Which is insane.” Meta and Microsoft are helping them, some. But, AMD’s share of total AI revenue will decline despite revenue growing next year.

– More after I eat dinner —

40 Likes

Edit: I tried to make more clear that I’m super excited to read what Smorgasbord has to write bout this amazing BG2 podcast after dinner and what I had lifted from this article I used as a comparison.

An article came out today on Seeking Alpha, saying mostly the same as the the BG2 podcast above . I read it then watched this BG2 episod, twice each (I recommend listening to the BG2 at regular speed or slower, even if one did read this article first🤯).

<<<Inference reasoning, per query with O3
Is my short hand for both.

Seeking Alpha-

Nvidia Stock Is Set To Surge From OpenAI’s o3 Breakthrough (NVDA)

Key to Nvidia’s future success is the notion that scaling laws still have a way to go. While pre-training scaling is hitting a soft wall due to the data limit (except where synthetic data from machine learning can be functionally proven, yes/no), test-time compute scaling is still in its early stages. As reasoning-centric models proliferate, there is no intrinsic reason why we won’t keep pushing for more complex reasoning per query. The productivity gains made from true reasoning models will justify the large costs associated with running such queries.

With every new model that tests the boundaries of reasoning, Nvidia is in a perfect positionto supply even larger compute clusters, faster networking, and more advanced orchestration software. Even the coding layer is dominated by Nvidia’s CUDA framework, which again puts the company at a huge advantage compared to competitors.

Best

Jason

18 Likes

From the BG2 podcast, “ Nvidia’s highest cost is HBM memory, not TSMC.”.

per the Miccron ER, re: HBM revenue, Micron doubled HBM revenue QoQ.

This doesn’t necessarily mean prices will come down with (undisclosed amount of) increased production. Does anyone here feel they have a finger on the pulse of HBM pricing?

11 Likes