NVIDIA’s New Chips will Drive the ‘Second Cooling Revolution’ in Data Centres

By Peter Huang

In the evolving landscape of artificial intelligence, a transformation is underway in how we manage the enormous amounts of heat created by computing. As AI systems grow more pervasive and more powerful, traditional existing cooling methods are struggling to keep up, creating an urgent need for new approaches to thermal management in data centres worldwide.

The AI Power Surge

The computational demands of AI systems have skyrocketed in recent years. Training advanced large language models requires massive processing power, with leading systems now consuming energy measured in megawatts rather than kilowatts. Newly constructed hyperscale data centres require power capacities of at least 100 megawatts, with this exponential growth in power density creating thermal challenges that conventional air cooling systems cannot address efficiently.

Traditional fan-based cooling systems consume substantial energy themselves, struggling to remove heat effectively from densely packed computing environments. They typically struggle to thermally manage racks with a density of over 50Kw. Average power densities are increasing at a vast rate and expected to rise to 30kW by 2027. Many data centres have rack power densities above this figure. For example, training models like ChatGPT can consume more than 80 kW per rack. Air cooling therefore acts as a bottleneck to AI advancement.

Companies have begun to realise that they must adapt to keep pace and remain competitive. NVIDIA’s new Blackwell Ultra chip, its newest AI platform, is the first designed to be fully liquid-cooled, featuring integrated cold plates and optimised thermal interfaces. As a pioneer of AI infrastructure, where NVIDIA leads, others are set to follow. It is likely we will start to see an industry-wide recognition that liquid cooling is the best enabler of future of next-generation computing.

Why Liquid Cooling is the Answer

Liquid cooling offers an effective alternative to fan-based cooling infrastructure. Liquids, whether water or specialised coolants, have up to 3,000 times the cooling efficiency of air. They can be channelled directly to the heat source, in this case, the AI chip, leading to vastly improved thermal management. Direct-to-chip approaches utilise specialised cold plates to deliver water or coolants precisely to heat-producing components, while immersion methods completely submerge hardware in dielectric fluids. Both techniques enable dramatically higher compute densities, with NVIDIA’s shift towards a fully liquid-cooled chip representing a much-needed shift from outdated cooling infrastructure.

This shift to cooling aligns with industry predictions, with new industry research revealing that of 600 data centre leaders surveyed across the globe, nearly three-quarters (74%) believe that immersion cooling is now the only option for data centres to meet the current computing power demands. Additionally, the majority of those surveyed believe that this switch must happen in the next three years for us to continue to see improvements in AI technology. This urgency reflects a growing recognition that cooling infrastructures are becoming the primary bottleneck in AI advancement, with many organisations already encountering limitations to their cooling setups that prevent deployment of the latest hardware or possible system failures. Without widespread adoption of liquid cooling technologies, experts warn that we could face a significant slowdown in AI development as computational demands continue to accelerate.

Despite immersion cooling inevitably becoming the future of data centre cooling technology, the report helps to identify several barriers slowing industry-wide implementation. The limited availability of immersion-ready equipment remains a primary concern, with many data centre operators struggling to source components specifically designed for liquid environments. This challenge is compounded by infrastructural issues, as existing facilities built around traditional air cooling require costly retrofitting to accommodate immersion systems. Perhaps most critically, knowledge gaps throughout the industry have created uncertainty about hardware compatibility and long-term performance. Many IT professionals and facility managers lack experience with these systems, creating uncertainty about maintenance procedures, component compatibility, and reliability. However, with proper planning and support, the implementation process can be managed without significant disruption to allow for the transition to immersion cooling.

The liquid cooling revolution represents a pivotal moment for data centres, but success requires industry collaboration. Technology providers, data centre operators, and hardware manufacturers must form partnerships focused on rigorous testing and accelerated R&D initiatives to keep pace with AI’s escalating demands. These collaborative efforts will help standardise immersion-ready components, develop implementation best practices, and build the specialised knowledge needed for widespread adoption. As AI continues to evolve, liquid cooling stands not merely as an alternative thermal management approach, but an imperative to progressing the future of AI. Those who delay adoption risk finding themselves with facilities that cannot support the next generation of computing hardware, creating a competitive disadvantage that will be difficult to overcome. By embracing this transformation today, businesses can ensure the growth of their AI technologies well into the future.

Peter Huang is VP of Data Centre Thermal Management at Castrol

AsiaBizToday