SemiAnalysis.com has a new long article on DataCenter Cooling that’s worth a skim for potential/existing NVDA and SMCI investors. They gather data directly and indirectly from industry players, and even from drone footage to see what’s installed outside the data center buildings.
Demand for Liquid Cooling is underestimated and will lead to an increase in inefficient “bridge” solutions as there won’t be enough liquid-cooling capable datacenters.
• Nvidia shook the entire Datacenter Industry in March when it announced that its state-of-the-art AI computing platform would be a 120kW, 72-GPU rack exclusively cooled via Direct-to-Chip Liquid Cooling (DLC)… DLC is not a new technology, but it has long been confined to cost-insensitive R&D government supercomputers that operate at >100kW rack density – and Google’s custom AI infrastructure.
• Cooling is the second largest capital expense for a datacenter, after electrical.
• Google and Meta operate very power efficient datacenters, while Microsoft and AWS are less efficient. 60%-80% of that is cooling.
• The way Meta achieves its high efficiency requires it to take longer to build its datacenters, and so because time to get compute up is so important, Meta has scrapped that system in favor of less efficient systems.
• GenAI datacenters have much different cooling requirements than earlier, and so while the hyperscalers were previously able to make air cooling efficient, that’s no longer the case with AI.
• "We believe that the real drivers behind liquid cooling adoption are still misunderstood, and so is the future of cooling systems for inference vs training datacenters. "
Unfortunately, the rest of the article is behind a paywall.