Some good-sounding info from The Information.
Should make a window for AMD to grab up some more business, and to be farther along with successors to the current MI300 when Blackwell finally shows up.
The Blackwell design problem came up in recent weeks, as engineers at TSMC discovered flaws in preparation for mass production, said the two people involved with the Blackwell chip production.
The GB200 chips contain two connected Blackwell GPUs alongside a Grace central processing unit. The problem involved a processor die—a piece of silicon that holds circuits for a chip—that connected the two Blackwell GPUs. The snag decreased the yield, or number of chips TSMC was able to produce for Nvidia. Such problems typically prompt companies to stop production.
As a result, Nvidia has been making adjustments to the design and will have to conduct a new production test run at TSMC before mass production can begin, the people said.
Nvidia told at least one cloud provider that it might consider producing a version of the chip that only contains one Blackwell chip, in an effort to avoid the die issue and ship chips faster, according to someone who spoke with Nvidia about the delay.
TSMC initially planned to start mass production of the Blackwell chips in the third quarter and ship them en masse to Nvidia customers starting in the fourth quarter. The Blackwell chips are now expected to go into mass production in the fourth quarter, with the servers slated for mass shipment in the subsequent quarters if no further issue arises, they said.
…Still, it is highly unusual to uncover significant design flaws right before mass production. Chip designers typically work with chip makers like TSMC to conduct multiple production test runs and simulations to ensure the viability of the product and a smooth manufacturing process before taking large orders from customers.
It’s also uncommon for TSMC, the world’s largest chipmaker, to halt its production lines and go back to the drawing board with a high-profile product that’s so close to mass production, according to two TSMC employees. TSMC has freed up machine capacity in anticipation of the mass production of GB200s but will have to let its machinery sit idle until the snags are fixed.
The design flaw will also impact the production and delivery of Nvidia’s NVLink server racks because the companies that work on the servers have to wait for a new chip sample before finalizing a server rack design.