CosmoFlow: Scale-Aware Representation Learning for Cosmology with Flow Matching

Kannan, Sidharth, Qiu, Tian, Cuesta-Lazaro, Carolina, Jeong, Haewon

arXiv.org Artificial Intelligence 

The large-scale structure of the Universe provides one of the most stringent tests of gravity on cosmological scales. Over the past decades, the ΛCDM cosmological model has emerged as the standard framework for understanding our cosmos, where Λ represents the cosmological constant (associated with dark energy) and CDM denotes cold dark matter--which together comprise approximately 95% of the Universe's energy budget. Theoretical predictions of ΛCDM can now be implemented with remarkable precision in numerical simulations, which capture the formation of the cosmic web: an intricate network where galaxies reside in dense clusters, connected by filamen-tary structures and separated by vast cosmic voids. This success, however, presents cosmology with a new challenge. High-resolution simulations like AbacusSummit generate datasets exceeding 2000 TB, severely constraining our ability to scale training datasets for machine learning applications. Moreover, extracting meaningful insights from these high-dimensional datasets requires models that can effectively navigate the curse of dimensionality.