Nvidia’s next-generation Blackwell AI processors are facing a serious challenge due to overheating issues in high-capacity server racks. This raises concerns for customers such as Google, Meta and Microsoft, especially large technology companies, leading to both loss of performance and the potential risk of hardware damage. Blackwell GPUs face severe heating problems in servers that can consume up to 120 kW of energy in high-density systems, and this has forced Nvidia to make design changes and production schedule delays.
Nvidia’s Blackwell AI chips have a heating problem?
To address these issues, Nvidia is reportedly redesigning cooling systems and issuing new engineering instructions to suppliers. However, this process not only delayed shipment dates, but also affected the company’s production efficiency. Processors produced with TSMC’s CoWoS-L packaging technology had led to failures due to structural problems caused by thermal expansion differences. Although Nvidia announced that it has solved these problems by making adjustments to the processor design, it takes time to solve the problems.
Although Nvidia describes such design revisions as a normal process, according to the original plans, the processors were expected to be ready in the second quarter of 2024, while mass production could only begin in October. This means shipments could be delayed until January 2025. Nvidia’s continued collaboration with cloud providers and development of new solutions could mitigate the long-term impact of these issues.