Edit at the bottom
Why I might be looking to sell
Not!!!
Chip,design competition in inference?
This YouTube video with CEO, Founder, of Groq is saying that they’re scaling up to surpass Nvidia for Inference compute in the Data Center next year. Given that inference:Compute is likely 5:1 now but is likely to be 95:5 in the next few years (and is 40% of Nvidia revenue now…hmmmm).
Despite all my talk about Nvidia’s Moat on this thread and elsewhere…I’d be remiss if I didn’t say this has definitely added some doubt in my mind.
But wait…Was The CEO of Groq full of it?
Yes, on at least his main argument:
He said Groq is 4xfaster than ‘the Blackwell chip at 10% the per token cost.
But, Nvidia does not sell just a Blackwell ‘chip’…
The GB200 is a key component of the NVIDIA GB200 NVL72, a multi-node, liquid-cooled, rack-scale system for the most compute-intensive workloads. It combines 36 Grace Blackwell Superchips, which include 72 Blackwell GPUs and 36 Grace CPUs interconnected by fifth-generation NVLink. Additionally, GB200 NVL72 includes NVIDIA BlueField®-3 data processing units to enable cloud network acceleration, composable storage, zero-trust security and GPU compute elasticity in hyperscale AI clouds. The GB200 NVL72 provides up to a 30x performance increase compared to the same number of NVIDIA H100 Tensor Core GPUs for LLM inference workloads, and reduces cost and energy consumption by up to 25x.
The platform acts as a single GPU with 1.4 exaflops of AI performance and 30TB of fast memory, and is a building block for the newest DGX SuperPOD.
And interestingly he gave lots of reasons why Nvidia has the AI ‘Training’ for sure and likely the ‘Inference’ markets both ‘literally locked up for at least three years’.
So, Nvidia is currently loosing ground in CUDA’s relevance and therefore moat; but, Nvidia is building entire ecosystems of Enterprises completely locking in continued need for GPUs over anything else.
Not to mention embodied AGI
![IMG_1338.png]()
Nonethelessless…
On a per chip bases, Groq is killing it!
https://venturebeat.com/ai/groqs-breakthrough-ai-chip-achieves-blistering-800-tokens-per-second-on-metas-llama-3/
Edit:
I asked Grok what’s the difference between Nvidia’s approach and Groq’s and then ‘why Nvidia doesn’t design chips the way Groq does’.
The innovators dilemma is in the answere, here.
Nvidia is not adopting Groq’s approach to address the memory wall due to several factors. One key reason is the difference in the architectural design and market focus of the two companies. Nvidia’s GPUs are designed for a broad range of applications, including gaming and AI training, whereas Groq’s LPUs are specifically tailored for AI inference tasks, which require ultra-low latency and high efficiency. This specialization in AI inference allows Groq to optimize its hardware and software for this specific use case, which Nvidia has not pursued to the same degree.
Additionally, Nvidia has a long-established market presence and a wide range of products catering to various industries and applications. It may be challenging for Nvidia to pivot its entire product line and design philosophy to match Groq’s approach without disrupting its existing business and customer base.
Furthermore, Nvidia has been investing heavily in its own technologies and innovations to address the memory wall and other performance bottlenecks. For example, Nvidia’s latest GPU architectures incorporate advanced memory management and bandwidth optimization techniques to improve performance.
In summary, Nvidia’s focus on a broad range of applications, its existing market position, and its ongoing investment in its own technologies are likely reasons why it is not adopting Groq’s approach to address the memory wall.
I would add to this, that the timing of when to focus on just inference is likely in Nvidia’s roadmap, just a guess.
Best
Jason