NVIDIA's GTC announcements

The big announcements from the NVIDIA GTC, IMHO:

Data Center segment announcements:

  • new Blackwell chips as the next-gen of Hopper AI Tensor Core, comes in B100 and B200 flavors, easily swaps in for H100s
  • new NVLINK Switch chip (in-network compute) to greatly speed up GPU interconnects (shared memory)
  • new Grace Blackwell (GB200) superchip (CPU+2xGPU) as the next-gen of Grace Hopper (GH200), which houses 2 GPUs vs 1
  • new DGX GB200 NVL72 supercomputer, a rack cluster of 36xGB200s with 72 B200 GPUs in a rack, giving an exaFLOP/s of inference in a rack
  • new 800G InfiniBand (Quantum) and Ethernet (Spectrum-X) switches and NICs, doubling AI supercomputer networking speeds

Blackwell (and likely all the rest above) is coming late 2024 (~Nov?), per the CFO on the IR day and hints in the keynote. So it is most likely going to start contributing in Q425. Surprisingly, the CEO noted how he was telling major customers to wait for the BH200 instead of getting the GH200 shipping in Q2 this year (~June?) due to the improvements. Of course, these are largely driven by it having 2GPUs vs 1 (and so over double the price tag).

These new Blackwell systems have a number of improvements over Hopper, including better chip performance, dual GPUs in the superchips, and a hugely improved interconnection throughput and shared memory (in the new NVLink Switch). They also announced a new FP4 (4-bit floating point) format for inference, which has less precision but at a much higher scale of throughput. These factors all combine into the 30x faster inference shown over H100.

The CEO keynote showed how it takes 1/4 of the B100 GPUs to train a GPT-4 size model (GPT 1.8T param MoE referred to) in the same time period an H100 cluster would… leading to 1/4 the energy use.

Software:

  • NVIDIA Inference Microservices (NIM) adds a modularized app stack (via Kubernetes containers) for deploying AI apps to production. These wrap OSS models like Llama 2 and Mistral, as well as provide tools needed in AI app stacks.
  • new NeMo microservices for LLMOps around managing, monitoring, and measuring Generative AI & LLMs (in early access)
  • CUDA-X microservices for data processing & RAG, as well as AI models for speech/translation, weather simulation, and routing optimization
  • announced RAG tools including NeMo Retriever (prev announced) for building RAG process over proprietary data, plus a newly announced “RAG LLM Operator” to deploy RAG into production as a NIM
  • the release of AI Enterprise v5.0, with all the NIMs & microsvcs above

Partners like Snowflake and Crowdstrike were mentioned as customers using these solutions now to create their own AI copilots. NIM is part of their AI Enterprise suite, which sells for $4.5K/year/GPU.

NVIDIA is making it easy to deploy AI apps from modular pre-built components, but is also opening AI Enterprise up to be a centralized platform that works across an ecosystem of partners, including AWS Sagemaker, GCP Vertex, Azure ML, and MLOps platforms like Dataiku and DataRobot. (I expect Snowflake to add NIM support in their Container Services too.)

Lots of other interesting announcements too beyond the Data Center ones above:

  • Omniverse Cloud is now API-based, pushing it into a centralized platform & ecosystem between 3rd party tools for industrial digitization, automotive, 3D visualization, smart factory/mfr, and robotic platform.
  • Omniverse Cloud was extended into becoming an end-to-end platform for the coming wave of humanoid robots, including the new GROOT robotic AI, Jetson Thor robotic controller modules, and Osmo controller & AI integrator (syncs the centralized AI to the robot’s AI).

-muji

71 Likes

Muji,
In a different thread you seemed to diss SMCI. Would you mind elaborating a bit on that? What do you think a more reasonable price for their stock would be?

12 Likes