One increment of functionality/scale-out at a time…
Theoretically, AMD’s MI450X IF128 can have an edge over Nvidia’s VR200 NVL144. However, its complexity and technical challenges may limit its initial success.
The Instinct MI450X IF128 will be AMD’s first system to support multiple AI processors across two racks using Infinity Fabric, extended over Ethernet. The machine will rely on 16 1U servers running one AMD EPYC ‘Venice’ CPU with four Instinct MI450X GPUs equipped with their own LPDDR memory pool and a PCIe x4 SSD. Each of the 128 GPUs will have over 1.8 TB/s of unidirectional internal bandwidth for inter-GPU communication within the same scaling domain, thus enabling significantly larger compute clusters than AMD has supported so far.
For scale-out communication outside the local group of GPUs (i.e., MI450X IF128 machines), the system will include up to three 800GbE Pensando network cards for each GPU. This provides a total outbound network bandwidth of 2.4 Tb/s per device (via PCIe). A secondary configuration will also be available, allowing each GPU to use two 800GbE network cards connected using a PCIe interface. However, this version will not be able to use the full bandwidth of the interfaces, as the PCIe 5.0 links are insufficient to fully support two high-speed network cards.
Unlike Nvidia’s GB200-series systems, which use active optical cables with embedded components to connect racks, AMD will employ a simpler passive copper wiring approach. This strategy may help reduce system cost and power consumption, but could be limited by signal integrity or cable length constraints.
Also, due to the system’s complexity, manufacturing and deployment may face delays or technical issues. To address this risk, AMD is preparing a smaller version of the same architecture called MI450X IF64. This variant will be confined to a single rack and use a simplified interconnect design, which promises to enable a more predictable rollout.
If AMD manages to execute this architecture successfully, it could improve its position in the AI compute market, particularly AI inference systems. Whether it will be able to challenge Nvidia is something that remains to be seen, though.