Does anyone think that competitors will start to take a bite out of NVDA sales?

Because the latency to get answers from there to the customer is too large, because the infrastructure isn’t there to support it (you want to build your own power plant, really?), because hiring data center construction/maintenance people there is extremely difficult/expensive, because of worries about weather/disaster impacts, etc., etc.

Just google “how to decide where to build a big data center” and read a couple articles.

10 Likes

That is what I am trying to figure out. So this isn’t true? Latency does matter?

So they just can’t train and then move the data where latency doesn’t matter? They have to be close to the storage capacity. That makes them much more constrained.

Yes because they have to do it anyway. Switch built their own solar field for power and pulled off of the grid completely. A power station with a cheap supply of fuel would be a no brainer.

Thanks I appreciate that.

Andy

3 Likes

That misses on the large AI specific work Nvidia has done for its AI chip development, the years that has taken, and the bet Jensen Huang made back in 2017:

Mr. Huang helped start Nvidia in 1993 to make chips that render images in video games. …
In 2006, Mr. Huang took that further. He announced software technology called CUDA, which helped program the GPUs for new tasks…
A big breakthrough came in 2012 when researchers used GPUs to achieve humanlike accuracy in tasks such as recognizing a cat in an image — a precursor to recent developments like generating images from text prompts. Nvidia responded by turning “every aspect of our company to advance this new field,” Mr. Huang recently said in a commencement speech at National Taiwan University.
In 2017, it started tweaking GPUs to handle specific A.I. calculations.
That same year, Nvidia, which typically sold chips or circuit boards for other companies’ systems, also began selling complete computers to carry out A.I. tasks more efficiently. Some of its systems are now the size of supercomputers, which it assembles and operates using proprietary networking technology and thousands of GPUs. Such hardware may run weeks to train the latest A.I. models.

“This type of computing doesn’t allow for you to just build a chip and customers use it,” Mr. Huang said in the interview. “You’ve got to build the whole data center.”

Competing against Nvidia is tough - you’ve got skate to where the puck is going, and Wayne Gretzky just sent that puck out for you to follow.

17 Likes

It seems that cooling only accounts for 5 percent of cooling costs and maintenance. Might not be as big a problem as I thought.

** * Cooling Costs. Data center owners are extremely cost conscious. That’s why they are so focused on finding sites with low cost clean power, such as the Google site in Nevada which will run on 100% baseload geothermal energy. Alaska does offer the advantage over a site in western Australia in the sense that abundant cooling resources are available. But cooling costs typically represent 5% or less of ongoing operations and maintenance costs. So perhaps the cold winters may not offer quite as many thermal energy advantages as one would think. Iceland clearly shares this advantage with Alaska.

Alaska’s data center opportunity: A reality check and possible next steps | From the Grid (uaf.edu)

Andy

8 Likes

Edit at the bottom

Why I might be looking to sell

Not!!!

Chip,design competition in inference?

This YouTube video with CEO, Founder, of Groq is saying that they’re scaling up to surpass Nvidia for Inference compute in the Data Center next year. Given that inference:Compute is likely 5:1 now but is likely to be 95:5 in the next few years (and is 40% of Nvidia revenue now…hmmmm).

Despite all my talk about Nvidia’s Moat on this thread and elsewhere…I’d be remiss if I didn’t say this has definitely added some doubt in my mind.

But wait…Was The CEO of Groq full of it?

Yes, on at least his main argument:

He said Groq is 4xfaster than ‘the Blackwell chip at 10% the per token cost.

But, Nvidia does not sell just a Blackwell ‘chip’…

The GB200 is a key component of the NVIDIA GB200 NVL72, a multi-node, liquid-cooled, rack-scale system for the most compute-intensive workloads. It combines 36 Grace Blackwell Superchips, which include 72 Blackwell GPUs and 36 Grace CPUs interconnected by fifth-generation NVLink. Additionally, GB200 NVL72 includes NVIDIA BlueField®-3 data processing units to enable cloud network acceleration, composable storage, zero-trust security and GPU compute elasticity in hyperscale AI clouds. The GB200 NVL72 provides up to a 30x performance increase compared to the same number of NVIDIA H100 Tensor Core GPUs for LLM inference workloads, and reduces cost and energy consumption by up to 25x.

The platform acts as a single GPU with 1.4 exaflops of AI performance and 30TB of fast memory, and is a building block for the newest DGX SuperPOD.

And interestingly he gave lots of reasons why Nvidia has the AI ‘Training’ for sure and likely the ‘Inference’ markets both ‘literally locked up for at least three years’.

So, Nvidia is currently loosing ground in CUDA’s relevance and therefore moat; but, Nvidia is building entire ecosystems of Enterprises completely locking in continued need for GPUs over anything else.

Not to mention embodied AGI

IMG_1338.png

Nonethelessless…

On a per chip bases, Groq is killing it!

https://venturebeat.com/ai/groqs-breakthrough-ai-chip-achieves-blistering-800-tokens-per-second-on-metas-llama-3/

Edit:
I asked Grok what’s the difference between Nvidia’s approach and Groq’s and then ‘why Nvidia doesn’t design chips the way Groq does’.

The innovators dilemma is in the answere, here.

Nvidia is not adopting Groq’s approach to address the memory wall due to several factors. One key reason is the difference in the architectural design and market focus of the two companies. Nvidia’s GPUs are designed for a broad range of applications, including gaming and AI training, whereas Groq’s LPUs are specifically tailored for AI inference tasks, which require ultra-low latency and high efficiency. This specialization in AI inference allows Groq to optimize its hardware and software for this specific use case, which Nvidia has not pursued to the same degree.

Additionally, Nvidia has a long-established market presence and a wide range of products catering to various industries and applications. It may be challenging for Nvidia to pivot its entire product line and design philosophy to match Groq’s approach without disrupting its existing business and customer base.

Furthermore, Nvidia has been investing heavily in its own technologies and innovations to address the memory wall and other performance bottlenecks. For example, Nvidia’s latest GPU architectures incorporate advanced memory management and bandwidth optimization techniques to improve performance.

In summary, Nvidia’s focus on a broad range of applications, its existing market position, and its ongoing investment in its own technologies are likely reasons why it is not adopting Groq’s approach to address the memory wall.

I would add to this, that the timing of when to focus on just inference is likely in Nvidia’s roadmap, just a guess.
Best

Jason

17 Likes

I’m a little late on this thread, but I don’t think Dojo is relevant. The thread asked if competitors will erode Nvidia’s market. Dojo is not intended (at least so far) to be available to the general market. Tesla designed the chip to service their neural net AI. It’s for internal use exclusively. Further, Musk recently said he’s buying every chip Nvidia is willing to sell him.

15 Likes