After Intel earnings, what for AMD and NVidia?

So:

  • Apparently the desktop market in particular is… not falling as fast as it was, which is not to say coming back yet… and that helped boost Intel to an unexpected profitable quarter.
  • Pulling in Ericsson as a foundry customer is a good start…
  • Continuing to lose server share to AMD and to become relevant in AI.

Anyone have thoughts on whether this is a tailwind for AMD as well, or less so because of a) price point of most desktop parts b) AMD focus on server/data center products and AI, followed by laptop sales, with desktop probably lowest priority?

Is Nvidia the more interesting/important point of comparison vs. Intel?


“There’s a lot of interest in the industry for advanced packaging, because it is essential to deliver high-performance computing and AI,” Gelsinger said. “So we expect a lot more business coming our way in that area.”

On Tuesday, the company said it would work with Swedish telecommunications gear maker Ericsson (ERICb.ST) on a chip that Intel will fabricate with its most advanced manufacturing technology it has disclosed.

## LAGGING IN AI

Sales in Intel’s data center and artificial intelligence business fell 15% to $4 billion from $4.7 billion in the year-ago quarter.

Those results beat Wall Street estimates, but reflect that cloud majors Microsoft (MSFT.O) and Alphabet expect to ramp up spending on data centers with most of the spending benefiting Nvidia (NVDA.O) that makes chips for AI.

The focus on chips that are suited for AI computing in the cloud have hurt the market for server chips for Intel, as has a sluggish recovery in China.

“It is still very clear that Intel is absolutely losing share around server CPUs, and I think it is fair to say that they are fighting for relevance in AI,” said Jenny Hardy, portfolio manager at GP Bullhound that owns AMD and Nvidia stock.

An inventory glut in server central processing units (CPUs), will persist until the second half of the year, Gelsinger said on the conference call, and that data center chip sales will decline modestly in the third quarter before recovering in the fourth quarter.

Gelsinger said right now Intel has enough customer orders to sell at least $1 billion worth of its AI chips through 2024.

Intel forecast adjusted current-quarter earnings per share of 20 cents. Analysts polled by Refinitiv expected 16 cents.

It forecast adjusted revenue of about $12.9 billion to $13.9 billion, compared to estimates of $13.23 billion. The midpoint of $13.4 billion exceeded estimates but still implies a 12.6% drop over the year in Intel’s business.

Intel forecast adjusted gross margin of 43% for the third quarter, compared to estimates of 40.6%.

Intel shares have risen about 30% so far this year, compared to a 50% rise on the Philadelphia SE Semiconductor index (.SOX) in anticipation of an industry recovery.

4 Likes

A good quarter for INTC beating estimates and growing revenue and EPS Q over Q. They are still showing a decline when you look Y over Y as many do. I am not hopeful that AMD clients will be a bright spot as they still had a significant inventory overhang of older products last quarter. I think the best that can be hoped for is that the inventory problem is cleaned up.

Servers is going to be interesting and I don’t know what to expect. Clearly NVIDIA is increasing sales faster than they can make product. It appears that neither AMD MI250 nor INTC ponte vecchio was able to fill the void. OTOH, INTC did ship a significant amount of their Gaudi accelerators to backfill the Nvidia shortage.

INTC stated the root cause of X86 server decline was big server customers moving capex to AI accelerators and not building out additional “regular” compute. IF this is the case it could be ugly for AMD. I do believe Q2 was the first quarter of high volume production Genoa EPYC, so with this volume layered on top of Milan I expect AMD to have done very well in X86 server share. Intel may win overall server revenue share growth due to Gaudi, but AMD should win X86 server share growth.

2 Likes

It may look bad for AMD for Q3, but the MI300 will be announced in Q4 and I suspect that a lot will be shipped before then. It looks to be a big improvement over the MI200 series. But the important point here is that the MI series works much better with AMD CPUs. Yes, nVidia is supporting sharing memory with the CPU. In other words (assuming that the CPU also supports it, the GPU can directly address CPU memory. AMD though, is a step beyond that. The CPU and GPU can share caches. Yes it is currently only at the L3 cache line level, but that is a significant improvement in latency over fetching data a word or four at a time. Can it lead to thrashing? Sure. Tasking/threading logic will need to prevent writes to the same word by the CPU and GPU. (Or read-write timing errors.) In practice, you need to do this control at a cache line level of granularity anyway

I’ve read that AMD is selling out on the new 7700 and 7800 gpu’s through vendors around the world. I don’t know if that will help with earnings this quarter since they just were released…doc

The CPU and GPU can share caches.
I am sure this is the case for the MI300A that has both GPU and CPU on the same chip. I would think for the MI300X (the AI device) it would depend on which CPU is used and the board topology/interconnect.
Alan

1 Like

It has actually been a part of AMD CPUs since Clawhammer and Sledgehammer (first Opteron), but it waited for GPUs with shareable L3 cache. Basically, the GPU tells the CPU that it is not a CPU, but an partner device which can share memory and (L3) cache. The CPU assigns unique addresses to the video memory, and scans the GPU L3 in parallel to making a request to the main memory. (Remember that the AMD CPU L3 is a victim cache, so the (6 or more) L2s must be checked as well. The L1 caches duplicate data in L2, so it is only necessary to check the L2s and L3. Yes, this requires a lot of silicon, but it is already there for systems with multiple Zen chips, so it is no extra effort.

There may be data fetches where caching is slower than accessing main or graphics memory, but on average it is much faster. An example might be fetching a “dirty” cache line from L1D. This line needs to be written to main memory in addition to copying it to local L2 and possibly L3, then GPU L3, L2, and L1. I haven’t tried to time that case, but I probably should.

I think this assumes you are connecting some number of AMD devices using infinity fabric? It seems like you get the same thing using CXL, which was incorporated into both Genoa and the MI300. I am not sure what sort of scale and topology we will see with very large MI300x based systems. We do know with the MI300A in -edit- El Capitan they are connected with slingshot, which I suspect does not support sharing cache?
Alan

I think Slingshot supports caching but does not share caches. In other words, a cache line can be sent and received by Slingshot and marked as shared on the sending CPU, but the line will have to be sent back to the source CPU if modified. I won’t go into the messiness of writing to the same cache line on two CPUs. The solution, as on a single or dual socket system is just don’t do that.

Assuming multiple reads, single write, sharing cache lines between nodes and between the eight CPUs on a single board is just going to differ in latency. The big advantage is that your code doesn’t have to treat access to local CPUs and remote nodes differently. You could use CXL for all interthread communications, but that would be slower.

Are there cases where CXL will be the same speed or faster than cache sharing? Yep. One of my favorite commands, PREFETCHNTA, intended to avoid cache pollution, can mess things up big time. On recent AMD processors AFAIK, it reads the data into the L2 cache probably only into one way. If you have written to a line fetched by prefetchnta, a fetch from another CPU (or for that matter another core) will need to write the line to the L2 and L3 caches before it can be shared, or remove it from the caches instead. CLFLUSH will do this and write to memory when necessary, but now you need to write and read memory. The write queue will speed this up, but it is not something I would like in my code. So use prefetchnta with care when writing to the data. (The code I use for matrix multiplication does a row from A times a row from B transposed. These can both use prefetchnta as the only writes are to C O(n2) times. Oh, and if A and B are the same shape, offset the start of B so they don’t collide. :wink:

I am only so concerned about weakness in q3 – if it slips into the 90s I will pick up some more shares if I can raise cash between now and then by selling some gainers (or losers for that matter). I kinda like our chances :slight_smile: