Long paper on data center cooling

SemiAnalysis.com has a new long article on DataCenter Cooling that’s worth a skim for potential/existing NVDA and SMCI investors. They gather data directly and indirectly from industry players, and even from drone footage to see what’s installed outside the data center buildings.

Demand for Liquid Cooling is underestimated and will lead to an increase in inefficient “bridge” solutions as there won’t be enough liquid-cooling capable datacenters.

• Nvidia shook the entire Datacenter Industry in March when it announced that its state-of-the-art AI computing platform would be a 120kW, 72-GPU rack exclusively cooled via Direct-to-Chip Liquid Cooling (DLC)… DLC is not a new technology, but it has long been confined to cost-insensitive R&D government supercomputers that operate at >100kW rack density – and Google’s custom AI infrastructure.
• Cooling is the second largest capital expense for a datacenter, after electrical.
• Google and Meta operate very power efficient datacenters, while Microsoft and AWS are less efficient. 60%-80% of that is cooling.
• The way Meta achieves its high efficiency requires it to take longer to build its datacenters, and so because time to get compute up is so important, Meta has scrapped that system in favor of less efficient systems.
• GenAI datacenters have much different cooling requirements than earlier, and so while the hyperscalers were previously able to make air cooling efficient, that’s no longer the case with AI.
• "We believe that the real drivers behind liquid cooling adoption are still misunderstood, and so is the future of cooling systems for inference vs training datacenters. "

Unfortunately, the rest of the article is behind a paywall.

13 Likes

Yep, some mighty big corporations are going to have to space GPUs farther away from each other than is ideal, or risk losing first-mover advantage.

IMO the actionable insight here is inefficient GPU density driven by a lack of available liquid cooling is bullish for companies like $ALAB.

SIDE NOTE: Sometimes imo it pays to “go into the weeds” as I’ve been chided here for doing. Refer to previous conversations on this board where the utility of even learning about liquid cooling was questioned. The liquid-cooling bottleneck in the global GPU ecosystem is an important component of the bull thesis for companies like $ALAB

19 Likes

For ALAB, as well as for such companies as Pure Storage. Their cloud storage requires far less energy than traditional technology.

11 Likes