A Digital Twin Framework for Liquid-cooled Supercomputers as Demonstrated at Exascale

Brewer, Wesley, Maiterth, Matthias, Kumar, Vineet, Wojda, Rafal, Bouknight, Sedrick, Hines, Jesse, Shin, Woong, Greenwood, Scott, Grant, David, Williams, Wesley, Wang, Feiyi

arXiv.org Artificial Intelligence 

The framework enables the study of "what-if" scenarios, system optimizations, and virtual prototyping of future systems. Using Frontier as a case study, we demonstrate the framework's capabilities by replaying six months of system telemetry for systematic verification and validation. Such a comprehensive analysis of a liquid-cooled ex-ascale supercomputer is the first of its kind. ExaDigiT elucidates complex transient cooling system dynamics, runs synthetic or real workloads, and predicts energy losses due to rectification and voltage conversion. Throughout our paper, we present lessons learned to benefit HPC practitioners developing similar digital twins. We envision the digital twin will be a key enabler for sustainable, energy-efficient supercomputing.