Lovell, Christopher C.
LtU-ILI: An All-in-One Framework for Implicit Inference in Astrophysics and Cosmology
Ho, Matthew, Bartlett, Deaglan J., Chartier, Nicolas, Cuesta-Lazaro, Carolina, Ding, Simon, Lapel, Axel, Lemos, Pablo, Lovell, Christopher C., Makinen, T. Lucas, Modi, Chirag, Pandya, Viraj, Pandey, Shivam, Perez, Lucia A., Wandelt, Benjamin, Bryan, Greg L.
This paper presents the Learning the Universe Implicit Likelihood Inference (LtU-ILI) pipeline, a codebase for rapid, user-friendly, and cutting-edge machine learning (ML) inference in astrophysics and cosmology. The pipeline includes software for implementing various neural architectures, training schema, priors, and density estimators in a manner easily adaptable to any research workflow. It includes comprehensive validation metrics to assess posterior estimate coverage, enhancing the reliability of inferred results. Additionally, the pipeline is easily parallelizable, designed for efficient exploration of modeling hyperparameters. To demonstrate its capabilities, we present real applications across a range of astrophysics and cosmology problems, such as: estimating galaxy cluster masses from X-ray photometry; inferring cosmology from matter power spectra and halo point clouds; characterising progenitors in gravitational wave signals; capturing physical dust parameters from galaxy colors and luminosities; and establishing properties of semi-analytic models of galaxy formation. We also include exhaustive benchmarking and comparisons of all implemented methods as well as discussions about the challenges and pitfalls of ML inference in astronomical sciences. All code and examples are made publicly available at https://github.com/maho3/ltu-ili.
Field-level simulation-based inference with galaxy catalogs: the impact of systematic effects
de Santi, Natalí S. M., Villaescusa-Navarro, Francisco, Abramo, L. Raul, Shao, Helen, Perez, Lucia A., Castro, Tiago, Ni, Yueying, Lovell, Christopher C., Hernandez-Martinez, Elena, Marinacci, Federico, Spergel, David N., Dolag, Klaus, Hernquist, Lars, Vogelsberger, Mark
It has been recently shown that a powerful way to constrain cosmological parameters from galaxy redshift surveys is to train graph neural networks to perform field-level likelihood-free inference without imposing cuts on scale. In particular, de Santi et al. (2023) developed models that could accurately infer the value of $\Omega_{\rm m}$ from catalogs that only contain the positions and radial velocities of galaxies that are robust to uncertainties in astrophysics and subgrid models. However, observations are affected by many effects, including 1) masking, 2) uncertainties in peculiar velocities and radial distances, and 3) different galaxy selections. Moreover, observations only allow us to measure redshift, intertwining galaxies' radial positions and velocities. In this paper we train and test our models on galaxy catalogs, created from thousands of state-of-the-art hydrodynamic simulations run with different codes from the CAMELS project, that incorporate these observational effects. We find that, although the presence of these effects degrades the precision and accuracy of the models, and increases the fraction of catalogs where the model breaks down, the fraction of galaxy catalogs where the model performs well is over 90 %, demonstrating the potential of these models to constrain cosmological parameters even when applied to real data.
Robust Field-level Likelihood-free Inference with Galaxies
de Santi, Natalí S. M., Shao, Helen, Villaescusa-Navarro, Francisco, Abramo, L. Raul, Teyssier, Romain, Villanueva-Domingo, Pablo, Ni, Yueying, Anglés-Alcázar, Daniel, Genel, Shy, Hernandez-Martinez, Elena, Steinwandel, Ulrich P., Lovell, Christopher C., Dolag, Klaus, Castro, Tiago, Vogelsberger, Mark
We train graph neural networks to perform field-level likelihood-free inference using galaxy catalogs from state-of-the-art hydrodynamic simulations of the CAMELS project. Our models are rotational, translational, and permutation invariant and do not impose any cut on scale. From galaxy catalogs that only contain $3$D positions and radial velocities of $\sim 1, 000$ galaxies in tiny $(25~h^{-1}{\rm Mpc})^3$ volumes our models can infer the value of $\Omega_{\rm m}$ with approximately $12$ % precision. More importantly, by testing the models on galaxy catalogs from thousands of hydrodynamic simulations, each having a different efficiency of supernova and AGN feedback, run with five different codes and subgrid models - IllustrisTNG, SIMBA, Astrid, Magneticum, SWIFT-EAGLE -, we find that our models are robust to changes in astrophysics, subgrid physics, and subhalo/galaxy finder. Furthermore, we test our models on $1,024$ simulations that cover a vast region in parameter space - variations in $5$ cosmological and $23$ astrophysical parameters - finding that the model extrapolates really well. Our results indicate that the key to building a robust model is the use of both galaxy positions and velocities, suggesting that the network have likely learned an underlying physical relation that does not depend on galaxy formation and is valid on scales larger than $\sim10~h^{-1}{\rm kpc}$.