Long form Interview with Dylan Patel (Semiconductor analysis: Nvidia)

Great discussion:

• 70% of all AI workloads are on Nvidia chips. About 28% on Google (thanks to Google Search and Google Ads, two of the largest money-making AI apps today, along with TikTok and Meta). So if you look at workloads people are purchasing, it’s 98% Nvidia.

• Google buys Nvidia chips for Google Cloud - to rent GPU compute time to customers. Probably because of CUDA.

• Patel says Nvidia is dominant because of a "three-headed dragon:

  1. Software: “Every semiconductor company in the world sucks at software - except for Nvidia.”
  2. Hardware: Nvidia gets to the newest technologies first.
  3. Networking
    As Brad says, multiple competitive moats.
    Patel goes on to point out that the Blackwell racks are huge - 3 tons, and only Nvidia can do it all in-house.

“Building a chip is one thing. But building many chips that connect together, cooling them, networking them…is a whole host of things that other semiconductor companies don’t have the engineers for.”

• Blackwell’s Performance TCO is 5X Hopper.

• “The cost for delivering LLMs is tanking, which is going to induce demand.”

• Nvidia has a lot more software than just CUDA for training. But, CUDA is essential for training, as this is the development stage, and engineers are constantly trying new things and it’s not worth spending time optimizing things themselves. They rely on CUDA/Nvidia being fast/good enough off the bat with their development tools.
But, on the inference side, which is deployment, customers like Microsoft can see benefits to hiring engineers and tuning the models to run on cheaper hardware since those apps will run for 6 months - much longer than a training try.

• Patel believes that companies are upgrading their non-AI data centers in order to gain power to run new GPU installations in those data centers. Essentially, the new CPUs are also more performance per watt and per rack, so upgrading those frees up rack and power for new AI racks and workloads.

• Synthetic data generation is just getting underway and will increase the results of training compared to training on the entire internet today.

• “When you look at The Street’s estimates for capex, they’re all far too low…This whole scale is over narrative falls on its face when you look at what the people who know the best are spending on.”

• Nvidia’s source of capital is a lot different than Cisco’s back in the day. And the private market contribution today is much smaller (accounting for inflation) than it was back in the Dot Com Boom days. Today, the source of the money is cash flows from the most profitable companies in the world.

• GPT4 cost millions of dollars to train, but it’s generation billions of dollars in revenue.

• Consumer is paying 50X more per query now, but they’re getting value out of it because they’re getting things they couldn’t get before at any cost. Example is for code development - spending more is still cheaper than human coders. Gives examples of making $300k/year programmers 20% more efficient, or replacing 100 developers for 75 or 50 - those are “so worth using the most expensive model.”

“The cost for intelligence is so high in society”

• Memory is growing faster than GPU. Nvidia’s highest cost is HBM memory, not TSMC.

• The only reason people buy AMD GPUs is because they have more memory in the package. Patel: “Maybe we can’t design as well as Nvidia, buyt we can put more memory on it… The software isn’t nearly as good, the compute elements aren’t nearly as good but by golly they’ve got more memory bandwidth per dollar.”

• AMD is missing software, they won’t spend the money to build a GPU cluster for themselves to develop software. “Which is insane.” Meta and Microsoft are helping them, some. But, AMD’s share of total AI revenue will decline despite revenue growing next year.

– More after I eat dinner —

46 Likes

Edit: I tried to make more clear that I’m super excited to read what Smorgasbord has to write bout this amazing BG2 podcast after dinner and what I had lifted from this article I used as a comparison.

An article came out today on Seeking Alpha, saying mostly the same as the the BG2 podcast above . I read it then watched this BG2 episod, twice each (I recommend listening to the BG2 at regular speed or slower, even if one did read this article first🤯).

<<<Inference reasoning, per query with O3
Is my short hand for both.

Seeking Alpha-

Nvidia Stock Is Set To Surge From OpenAI’s o3 Breakthrough (NVDA)

Key to Nvidia’s future success is the notion that scaling laws still have a way to go. While pre-training scaling is hitting a soft wall due to the data limit (except where synthetic data from machine learning can be functionally proven, yes/no), test-time compute scaling is still in its early stages. As reasoning-centric models proliferate, there is no intrinsic reason why we won’t keep pushing for more complex reasoning per query. The productivity gains made from true reasoning models will justify the large costs associated with running such queries.

With every new model that tests the boundaries of reasoning, Nvidia is in a perfect positionto supply even larger compute clusters, faster networking, and more advanced orchestration software. Even the coding layer is dominated by Nvidia’s CUDA framework, which again puts the company at a huge advantage compared to competitors.

Best

Jason

22 Likes

From the BG2 podcast, “ Nvidia’s highest cost is HBM memory, not TSMC.”.

per the Miccron ER, re: HBM revenue, Micron doubled HBM revenue QoQ.

This doesn’t necessarily mean prices will come down with (undisclosed amount of) increased production. Does anyone here feel they have a finger on the pulse of HBM pricing?

11 Likes

I’ll add a few of my takeaways:

• Broadcom well positioned because it has the best SERDES technology. Essentially, can allow chips to interconnect over passive cables vs active. Active cables have chips in them (see Astera Labs) discussion. I have found it difficult to understand how big Astera Labs can get and this pod did nothing to clarify that for me.

• HBM is going to be at capacity for a long time. They do banter a bit about if HBM changes the age old cyclical nature of memory. The answer is clearly no. But the runway to its commoditization is probably several years. Will it be enough of Microns revenue,at high margins, for long enough for long term investors? Microns margins should improve and likely has a few year runway, especially if the WW demand for its commodity products gets a boost (PCs, consumer electronics etc).

•lots of discussion on if 2026 is the year we find out if the build out has long legs, which of course IS the only question that matters.

8 Likes

While HBM is essential for GPUs, it’s worth noting that Micron’s HBM is not used in any of NVidia GPUs yet, at least based on the currently available public information. Nvidia uses HBM from Korean companies.

Luffy

4 Likes

Yeah, this video was 85% about Nvidia and 8% Google, 5% AMD and 2% about Amazon.

As for active vs passive, this article from Patel’s SemiAnalysis covers Amazon’s custom ASIC and why it needs active cabling:

The Trn2-Ultra SKU will also have AEC cables which Astera Labs will be supplying. We believe that the total networking connector and cabling cost will come up to nearly $1,000 per chip. For the Trn2 SKU, although there is no inter-server NeuronLinkv3 AEC cables, the increase of EFAv3 bandwidth to up to 800Gbit/s per chip will more than offset that savings, increasing the total cost for networking connector and cables costs to ~$1.2k per chip.

It’s worth noting that the high usage of Astera in Amazon’s custom chip design is probably related to Amazon’s investment in Astera:

In a somewhat similar vein, Anthropic’s use of Amazon chips is probably related to Amazon’s invesment in it:

Here’s a SemiAnalysis article from March on Astera:

(Can’t get all the way through without paying, but there’s a good amount of history and product overview for free).

With Blackwell, Nvidia has reduced Astera retimer content compared to Hopper. That hit the stock price a little while ago. This is why, I believe, Astera is focusing more on the custom ASIC side of their business. Amazon uses a lot of Astera in their design, and Astera touts that a lot, but I haven’t heard much about what Google, Meta, Microsoft, or Oracle are using in terms of Astera content.

15 Likes

Good notes Smorg!

Here’s my recap.

NVIDIA has all the marketshare

  • NVDA has 98% of Ai market w/o GOOG, probably 70% overall.
  • GOOG heavily runs internal workloads on TPUs, for both LLM and non-LLM workloads (Search, Ads, YouTube, etc).
  • GOOG used Broadcom to scale up lower-power ASICs via networking. NVDA then did same w/ NVLink.

Training Scale

Synthetic data:

  • New scale in pre-train, creating synthetic data around objectively proven tasks (math, science, coding, provable goals).
  • Grade the outputs. Find chain of reasoning paths that lead to correct answer, build training pool on that.
  • Plus still have huge untapped pool of video data (probably audio and image too), though transcripts have already been scrapped.

Scaling laws:

  • Satella recent comments on BG2 taken wrong. Bottleneck has shifted from chips to now power.
  • Scaling laws not dead… see how Meta, Amazon, Google, Microsoft are all building huge DCs. (Following xAI’s lead here.)
  • Scaling laws still growing exponentially but getting more and more complicated (networking, liquid, etc).
  • Plus hyperscalers are now building high-speed fiber bandwidth between DCs to interconnect. GOOG and MSFT have huge plans to interconnect, distribute load across DC/regions.

Reasoning:

  • Pre training still scaling up plus new methods emerging.
  • Inference time reasoning is big new scale vector.
  • Chain of thought, with AI generating the reasoning used (thinking).
  • Might be 10x the tokens in the background, hugely ups the underlying and customer costs.
  • Greatly increases memory use and KV (vector db) needs, can only handle 1/4-1/5 the concurrent usage.
  • Margins will compress and costs passed through, but much higher quality responses.
  • O1 in early days, customers don’t really get it yet.
  • Google and Anthropic have models doing reasoning that are soon coming.
  • So sure, can save costs by using older models for tasks - but so worth using most recent models for most productivity and value.
  • Improve software dev by 20% (for $300K empl), or perhaps cut staff needed in half, or ship twice as much.

Memory needs

  • Reasoning will balloon memory usage from here.
  • KV cache (internal AI memory) needs could exponentially grow quadratically.
  • More complex reasoning coming, 100Ks of intermediate tokens generated in (AI brain) internal memory vs 10Ks today.
  • Highest cost component in GPU is the HBM memory (not TSMC mfr/pkg).
  • Samsung, SK Hynix, Micron main memory providers. NVDA mostly using SK Hynix.
  • Low-end mem is fungible (commoditized), prices have fallen.
  • Samsung has no share in HBM.
  • SK Hynix and MU converting capacity to HBM.

Competition

AMD:

  • AMD is competitive but has no clue how to do software.
  • Don’t have their own native supercomputer. NVDA has multiple to diagnose issues, study architecture changes, simulation.
  • AMD also has no clue how to do system builds (supercomputer racks), acquired ZT Systems to gain knowledge.
  • Hyperscalers are trying to help AMD in these weaker areas. GOOG has been building native supercomputer racks since TPU v3.

Custom ASICs:

  • Dylan predicts less custom ASIC sales to MSFT, META. Will do okay, won’t go gangbusters.
  • GOOG TPU has 2nd most workloads (handles all GOOG internal workflows over both LLM and other ML).
  • Each TPU not that impressive (lower network, mem, compute)… key is how it scales up.
  • Works with Broadcom on interconnects … are competitive, maybe better.
  • TPUs scale to 8000, Has used water cooling for years, have better level of reliability.
  • Why not more successful? GOOG keeps TPUs for internal software needs. DeepMind has specialized software and access, not GCP custs.
  • List price is egregious (need to negotiate).
  • Better to use it all internal.
  • MSFT uses GPUs internally too. GM on token gen is 50-70%, renting is less.
  • Gemini and Search all use TPUs
  • One cust, Apple, has >70% of TPU rental. (Apple hates NVDA.)

AWS:

  • AWS Trainium, Dylan refers to as "Amazon Basic TPU”.
  • Trainium is really cheap, and provides huge memory access. Networking & software not as good.
  • Comparable to TPU, but used less effectively. Spending a lot on active networking (vs passive), and Serdes speed lower.
  • Built by Marvell, and pushing price down.
  • 64 chips in 2 racks (not one). Has way slower interconnection, and less mem per chip.
  • But high memory available, HBM per $ is good.
  • AWS passing cost savings on.
  • Making their own 400k supercomputer, to make better LLM models.
  • So is very cheap and cost effective.

Broadcom:

  • Broadcom exploding on custom ASICs. Multiple wins from GOOG, META, Open Ai, Apple (part).
  • Some coming in heavily in 25, some in 26.
  • Microsoft’s chip not as good, don’t expect it to ramp.
  • Networking side is so so important, have to go to Broadcom or Marvell.
  • Should make competitor to NVSwitch from here.
  • Broadcom is strong long term.
  • But over next 6mo will see slowdown in GOOG TPU purchase (no DC room for them).
  • Hyperscalers pushing custom silicon.
  • Google trying to leave Broadcom.

Landscape from here

  • Plans for hyperscalers are firm for 2025.
  • Networking, ASIC all will do well … NVDA, AVGO, MRVL, others.
  • 2026… will it continue?

Neoclouds:

  • Expect consolidation in Neocloud (80 vendors).
  • H100 rental prices tanking.
  • Expect core 5-10 to emerge, from sovereign and stronger ones.
  • Hyperscalers been 50-60% of their revenue til now, then sovereign.

NVDA:

  • Blackwell is over 2x costs of Hopper.
  • NVDA can ship same vols and still grow rev.
  • Do models continue to get better? (Lots of reasoning & multi-modal models coming.)
  • Will CSPs take FCF to zero? (Expect to spend more and more.)
  • Next huge influx of capital from sovereign funds - Saudi, EU, SEA.
  • Elon making everyone chase him (supercluster size).
  • Satcha said going to buy GPUs based on revs it can generate, not spending way ahead. MSFT, META, GOOG likely all doing that.
45 Likes

It just occurred to me that probaby the reason Apple dislikes Nvidia so much is ironically the same reason so many people avoid Apple and get Android - Both have created compelling eco-systems around their products, but charge accordingly for participating in those eco-systems.

Thanks for the additional notes!

10 Likes

Smorg - TheInformation had a recent paid piece on Apple and NVIDIA relationship issues if you are curious about it: https://www.theinformation.com/articles/how-apple-developed-an-nvidia-allergy?rc=lrvfn3

(MacRumors summed up some of it if you don’t subscribe to TheInfo.)

It seems Apple is doing everything it can do to avoid working with NVDA directly, including renting GPUs from hyperscalers (when Hopper needed), but also being a big user of Google TPUs and working with Broadcom on custom AI silicon.

From TheInfo piece: “Apple’s Nvidia allergy appears to stem partly from its frugality and a desire to own and control the key technological ingredients for its products to avoid giving others leverage over its operations. But leaders within Apple have also quietly nursed grudges against Nvidia for nearly two decades, stemming from business disputes that originated during the era when Steve Jobs was Apple’s CEO, according to 10 former Apple and Nvidia employees with direct knowledge of the relationship.”

I personally see this less as a Steve Jobs era grunge being held over (the latter), and more the fact Apple really wants to be in control of its own destiny (the former). Meta is doing the same right now with Llama and its own custom silicon AI chip, as is Google with the TPU and Gemini.

Apple already develops its own Arm CPU (M and A series), with integrated GPUs and Neural Engine (NPU) - staying away from GPU vendors. They are also rumored to now be moving away from Qualcomm by developing their own wi-fi & bluetooth modem.

So it is little surprise that they would want to build their own custom AI chips for server data centers and AI development. This is perhaps a good long-term decision but IMHO a VERY weak short-term one – as shown in the disappointing Apple Intelligence features in its debut. Apple should be moving faster, and would be if they were using NVIDIA Hopper and now Blackwell.

Here was TheInfo’s earlier piece a few weeks ago on Apple’s new AI chip with Broadcom: https://www.theinformation.com/articles/apple-is-working-on-ai-chip-with-broadcom?rc=lrvfn3

-muji

37 Likes