ORBIT-2: Scaling Exascale Vision Foundation Models for Weather and Climate Downscaling

Wang, Xiao, Choi, Jong-Youl, Kurihaya, Takuya, Lyngaas, Isaac, Yoon, Hong-Jun, Xiao, Xi, Pugmire, David, Fan, Ming, Nafi, Nasik M., Tsaris, Aristeidis, Aji, Ashwin M., Hossain, Maliha, Wahib, Mohamed, Wang, Dali, Thornton, Peter, Balaprakash, Prasanna, Ashfaq, Moetasim, Lu, Dan

arXiv.org Artificial Intelligence 

Sparse observations and coarse-resolution climate models limit effective regional decision-making, underscoring the need for robust downscaling. However, existing AI methods struggle with generalization across variables and geographies and are constrained by the quadratic complexity of Vision Transformer (ViT) self-attention. We introduce ORBIT-2, a scalable foundation model for global, hyper-resolution climate downscaling. ORBIT-2 incorporates two key innovations: (1) Residual Slim ViT (Reslim), a lightweight architecture with residual learning and Bayesian regularization for efficient, robust prediction; and (2) TILES, a tile-wise sequence scaling algorithm that reduces self-attention complexity from quadratic to linear, enabling long-sequence processing and massive parallelism. ORBIT-2 scales to 10 billion parameters across 65,536 GPUs, achieving up to 4.1 exaFLOPS sustained throughput and 74--98% strong scaling efficiency. It supports downscaling to 0.9 km global resolution and processes sequences up to 4.2 billion tokens. On 7 km resolution benchmarks, ORBIT-2 achieves high accuracy with $R^2$ scores in the range of 0.98--0.99 against observational data.