These are weird. With 64GB of RAM in the CPU package you can apparently run your server with no memory in the slots and no software changes etc… Everything just works, and the system runs cooler and uses less power. Or you can treat the 64GB as a giant cache in front of main memory. Or, various other combinations – the 64GB is actually divided into quarters and you can flexibly allocate however you want.
Intel Xeon MAX 9480 Deep-Dive 64GB HBM2e Onboard Like a GPU or AI Accelerator - Page 3 of 5
At this point, we have set up one of the key challenges of Xeon Max. Just how many options there are. To be clear, you can put the CPUs into a system, without DDR5, and the system boots up normally. Likewise, you can then add DDR5 and it will work normally, but the Xeon Max has extra options for tuning. Just taking the cases where one can boot the system with or without DDR5, and either treat the CPU as one set of resources with 64GB of HBM2e memory and 8x DDR5 channels or splitting those up into four sets gives us a 2×2 matrix of configuration options.
Intel Xeon Max 9480 Memory Config 2×2 Matrix
That is not all though. Intel has two different modes called “HBM Flat Mode” and “HBM Caching Mode” when using a system with DDR5. The easy way to think about this is that flat mode looks like two separate pools of memory, one HBM2e and one DDR5 with total capacity additive between the two pools. The HBM caching mode stores hot data in HBM while using the DDR5 as the primary DDR5 store. Of course, the amount of DDR5 memory installed in a system is another dimension that we are going to conveniently skip since 128GB of DDR5 and 64GB of HBM2e per socket as we have here is a 2:1 ratio, but using 128GB DDR5 DIMMs that would clearly be different.
Intel Xeon Max Summary Slide
Here is an example of the chip split into quadrants with 16GB of HBM2e and 32GB of DDR5 (2x 16GB DDR5-4800 DIMMs) per quadrant. One can see that our total memory capacity is 128GB even though we have 64GB of HBM2e and 128GB of DDR5.
Xeon Max Cache Topo
Adding that wrinkle, our 2×2 matrix becomes a 2×3 matrix as we have to split the DDR5 into cache mode and flat mode.