AMD: 10% data center GPU share?

https://seekingalpha.com/article/4712767-amd-nears-10-percent-data-center-gpu-share-less-than-3-quarters-post-mi300-launch

Some interpolation/estimation/speculation going on here but if even directionally it’s right, it’s a good sign.

The following table takes Nvidia disclosed data center numbers and interpolates them to calendar quarter numbers, subtracts estimated networking and software components, and estimates Nvidia DC GPU revenues on a calendar basis. For forward revenue estimates, Nvidia is assumed to beat guidance by $2B in the Q2, but growth is forecasted to tail off a bit towards the end of the year due to Blackwell delays. AMD is assumed to deliver $5B in revenues for 2024. Note that $5B is in line with most street expectations, and we are not modeling AMD to gain materially from Blackwell delays.

A table with numbers and symbols Description automatically generated

AMD rapidly gaining market share (Author)

What we see in the image above is that AMD started off its MI300 ramp with a bang, taking 4% market share in the very first quarter of shipments and reached 7% unit market share in the recently concluded Q2. In other words, AMD, from a standing start, has been able to get to a greater than 6% revenue market share in less than 3 quarters! That is a very rapid market share gain.

Unit share is growing faster than revenue share. If we adjust for the ASP discrepancy between Nvidia H100 and AMD MI300, it seems likely that AMD had over 9% unit share in Q2. The company is approaching 10% unit market share in less than 3 full quarters! There is probably no sell or buy side analyst that estimated that AMD will get to nearly 10% market share in less than 3 quarters.
,

Note that AMD has been able to get to 10% DC GPU market share with just three major customers: Microsoft (MSFT), Meta (META), and Oracle (ORCL). Microsoft was the first large hyperscaler to make public MI300X instances available in Q2 and has driven a substantial part of the revenues. Most of AMD’s MI300 revenues to date have come from the above 3 hyperscalers.

As fast as the MI300 ramp has been with hyperscalers, in enterprise, AMD has been hampered by a lagging software ecosystem. AMD has moved strongly to resolve this problem. To strengthen the software and support ecosystem, AMD acquired Silo AI, Mipsology, and Nod.ai, and invested over $125 million across a dozen AI companies in the past 12 months. Management noted that Hugging Face was one of the first customers to adopt the new Azure instances, enabling enterprise and AI customers to deploy hundreds of thousands of models on MI300X GPUs with one click. As its software matures, AMD is starting to accumulate design wins. Enterprise vendors like Dell (DELL), HP (HPE), and Lenovo (OTCPK:LNVGY) have adopted MI300 and the enterprise ramp is expected to begin in earnest in H2.

During AMD’s Q2 earnings call, Lisa Su noted that there was a large pipeline of customers currently evaluating MI300:

“MI300 enterprise and cloud AI customer pipeline grew in Q2, and multiple hyperscale and Tier 2 cloud providers are on track to launch MI300 instances in Q3.”

Note the emphasis on “multiple hyperscale” customers. For a company that currently has only three hyperscaler design wins, “multiple hyperscalers” launching instances in Q3 is huge news that many investors and analysts do not seem to understand the significance of. As a result of these design wins, we are forecasting that AMD will end 2024 with greater than 10% revenue share and greater than 12% unit share.

3 Likes

Yes, sounds very good. It’s wonderful to watch growing companies compound in size. Let’s be cautious though, it seems as if Microsoft et al are struggling to monetize it. I have never seen one of these AI assistants write a 30 line program that worked. Rather than being in the pleasant space of debugging the code you have written, you end up debugging “someone else’s” code without any helpful comments.

1 Like

I come from a different background, where testing harnesses for real-time software outweighs the actual product code by a substantial margin. Why is this, if you can’t guarantee zero bugs by normal testing? You need to write (logical) test code that doesn’t run the code thousands of times, but proves that the requirements are met. Notice that these requirements are often about execution times, so you need to prove that the code under test always gives the right answer, and does so within a specified time. That often (at the higher levels) includes multiple threads on different CPU cores. It used to be separate CPUs, which is much harder, since you end up losing CPU clocks synchronizing with the bus. Multiple high priority processes using the same bus? Arrgghh!

So I would be happy with AI that could design the test harnesses, or reuse existing test programs and change the parameters and the unit under test . Building test harnesses is not such a big deal as testing frameworks are highly reusable. Just put in the right parameters for this procedure or function. Recalculating those parameters when the requirements change? That’s why production for real-time code tends to be around one or two SLOC (source lines of code) per day.

Why bring this up? The Intel problems with 13th and 14th gen CPUs mean that until Intel can characterize these problems, there is no way to use them in safety critical or operation critical hardware. It is not a big deal if you know that a particular op code is wonky. Just add a rule to not use it in the safety critical part of the code. If you don’t know what is wonky, which is the current status of these Intel CPUs? Your project deadlines include an unknown delay unless you switch to an earlier Intel CPU, or a CPU from another vendor. Which right now means AMD almost 100% of the time. (Tesla is manufacturing its own CPUs. This, for now won’t affect the AMD CPUs in the large display. I don’t know what CPUs they use for the rest of the vehicle, but there are lots of them.)

Today, it is common to design such software such that the initialization part is not subject to strict timing requirements. Then the operating software and data is all in L2, except for display, debugging, or I/O code. Hmm, not quite clear. Writes send data to I/O devices, but a copy remains in cache. It hasn’t always been that way… Input code is written so that you read a value once, copy it to cache, and only do a memory read when you want/need new data.