Nvidia adjacent companies

Last month, Nvidia gave us a look at Rubin-based servers, and the 6 kinds of chips inside it:

It’s a server rack modular design, with Nvida-made boards containing the GPU and CPUs, plus networking boards are part of a Compute Tray (looks like a 1U but might be a 2U height tray) where the boards are in modules that snap into place. This was done to elminate cabling, hoses, and fans.

In between those compute trays, there are trays for the NVLink (Scale-Up) connectivity, and while separate, I believe those are required and can’t be replaced by third party options.

Finally, there is the external connectivity (Scale-Out), which in Nvidia’s reference design, uses their Spectrum-X photonix ethernet switch. This could be replaced in shipping units with something from Arista or Broadcom.

CNBC has an Nvidia advertisement disguised as a news video, but it’s worth watching just for more details as well as more looks at the components:

It lists the 80 third-parties that make components in Nvidia’s server reference design. Astera, for instance, is not listed.

The latest Rubins are liquid cooled, with components from Delta Electronics and Vertiv.

There’s also a peek near the end of the video at the upcoming Kyber racks, which reduce cabling even more.

Now, while Nvidia doesn’t sell complete racks, they do sell the boards and snap-in modules, and probably the trays. Third party companies like Dell, SMCI, HPE, etc. are the companies you deal with in terms of laying out racks and equipping them to your specifications, which probably most notably includes networking out.

I’m wondering how much of Astera’s offerings are in Rubin-based servers being delivered. Astera is a member of the NVLink Fusion ecosystem, and so could be making components for the internal GPU to GPU networking, but it seems to me that Nvidia is optimizing their boards to reduce retimer use.

And so, for Astera anyway, I think more and more the future for them is with non-Nvidia installations, such as Anthropic using Amazon’s Trainium chips, and, of course, AMD installations using UALink.

I don’t really have any conclusions to share, but this highly technical space is interesting and it’s getting harder to predict the winners and losers.

EDIT: Bad timing on my part: Semi-Analysis just came out with this detailed article with even more details:

33 Likes

Didn’t make it through the piece but I did search for “Astera” (no hits) and “Credo” (one hit…maybe you can translate to English?)

Meta will not be the only hyperscaler using 1.6T AECs for its VR200 deployments, however. We think xAI will use 1.6T AECs for both NIC-to-TOR and switch-to-switch connectivity at the leaf, spine and core layers. It will be a single-plane network replacing most 1.6T transceivers at the switch boxes – and this can give Credo plenty of pricing power.

I like the last phrase, anyway!

Bear

18 Likes

Meta and xAI are companies building out massive data centers based around Nvidia GPU server racks.

AEC - Active Electrical Cables. Not just a wire, it has processing built-in. Both Astera and Credo design/make these. The “1.6T” is 1.6 Terabits of data per second, which means it’s really fast.

VR200 - The “VR” stands for “Vera Rubin,” and is the successor to the [Edited] Nvidia Grace Blackwell (GB200), which are shipping in volume today. These are combinations of CPUs and GPUs (CPUs being “Grace” today and “Vera” starting later this year, and GPUs being “Blackwell” today and “Rubin” later) on a board.

NIC-to-TOR - Network Interface Card to Top Of Rack. The TOR is essentially Ethernet switches, from which you can connect to the switch at the top of other server racks.

I didn’t go back to read the context around the paragraph you quoted, but it appears Semi-Analysis thinks that Credo has products (AECs) that can be used at different levels in the networking stack, and does so in a “single-plane” configuration, which sounds like it has some advantages not just in speed, but in simplicity/uniformity. Hence, giving Credo some pricing power for its superior solution, according to SemiAnalysis.

Did that help?

EDIT: Corrected the VR200 section to clarify that Vera Rubin is a combination of CPU and GPU, not just a GPU. These are paired together in a 2 GPU to 1 CPU ratio so that the server can do both the Neural Net matrix calculations as well as “normal” compute that any server must do, including feeding data in and out. Nvidia designs its own ARM-based CPUs, and these are so good that Meta reportedly also buys Grace/Vera CPU-only servers from Nvidia since the compute per watt is high.

22 Likes

HAHAHA…I rarely feel THIS out of touch in the world of tech…dang.

THANK YOU SO MUCH. :slight_smile: I’m just an IT PM these days, not on the front lines so much anymore.

4 Likes

Made a small edit to my post above to clarify that the VR200 is a CPU/GPU combination, not just a GPU.

2 Likes