Goto

Collaborating Authors

 villaescusa-navarro


Cosmology with Persistent Homology: Parameter Inference via Machine Learning

Calles, Juan, Yip, Jacky H. T., Contardo, Gabriella, Noreña, Jorge, Rouhiainen, Adam, Shiu, Gary

arXiv.org Artificial Intelligence

Building upon [2308.02636], this article investigates the potential constraining power of persistent homology for cosmological parameters and primordial non-Gaussianity amplitudes in a likelihood-free inference pipeline. We evaluate the ability of persistence images (PIs) to infer parameters, compared to the combined Power Spectrum and Bispectrum (PS/BS), and we compare two types of models: neural-based, and tree-based. PIs consistently lead to better predictions compared to the combined PS/BS when the parameters can be constrained (i.e., for $\{\Omega_{\rm m}, \sigma_8, n_{\rm s}, f_{\rm NL}^{\rm loc}\}$). PIs perform particularly well for $f_{\rm NL}^{\rm loc}$, showing the promise of persistent homology in constraining primordial non-Gaussianity. Our results show that combining PIs with PS/BS provides only marginal gains, indicating that the PS/BS contains little extra or complementary information to the PIs. Finally, we provide a visualization of the most important topological features for $f_{\rm NL}^{\rm loc}$ and for $\Omega_{\rm m}$. This reveals that clusters and voids (0-cycles and 2-cycles) are most informative for $\Omega_{\rm m}$, while $f_{\rm NL}^{\rm loc}$ uses the filaments (1-cycles) in addition to the other two types of topological features.


Field-level simulation-based inference with galaxy catalogs: the impact of systematic effects

de Santi, Natalí S. M., Villaescusa-Navarro, Francisco, Abramo, L. Raul, Shao, Helen, Perez, Lucia A., Castro, Tiago, Ni, Yueying, Lovell, Christopher C., Hernandez-Martinez, Elena, Marinacci, Federico, Spergel, David N., Dolag, Klaus, Hernquist, Lars, Vogelsberger, Mark

arXiv.org Artificial Intelligence

It has been recently shown that a powerful way to constrain cosmological parameters from galaxy redshift surveys is to train graph neural networks to perform field-level likelihood-free inference without imposing cuts on scale. In particular, de Santi et al. (2023) developed models that could accurately infer the value of $\Omega_{\rm m}$ from catalogs that only contain the positions and radial velocities of galaxies that are robust to uncertainties in astrophysics and subgrid models. However, observations are affected by many effects, including 1) masking, 2) uncertainties in peculiar velocities and radial distances, and 3) different galaxy selections. Moreover, observations only allow us to measure redshift, intertwining galaxies' radial positions and velocities. In this paper we train and test our models on galaxy catalogs, created from thousands of state-of-the-art hydrodynamic simulations run with different codes from the CAMELS project, that incorporate these observational effects. We find that, although the presence of these effects degrades the precision and accuracy of the models, and increases the fraction of catalogs where the model breaks down, the fraction of galaxy catalogs where the model performs well is over 90 %, demonstrating the potential of these models to constrain cosmological parameters even when applied to real data.


Learning neutrino effects in Cosmology with Convolutional Neural Networks

Giusarma, Elena, Hurtado, Mauricio Reyes, Villaescusa-Navarro, Francisco, He, Siyu, Ho, Shirley, Hahn, ChangHoon

arXiv.org Artificial Intelligence

Measuring the sum of the three active neutrino masses, $M_\nu$, is one of the most important challenges in modern cosmology. Massive neutrinos imprint characteristic signatures on several cosmological observables in particular on the large-scale structure of the Universe. In order to maximize the information that can be retrieved from galaxy surveys, accurate theoretical predictions in the non-linear regime are needed. Currently, one way to achieve those predictions is by running cosmological numerical simulations. Unfortunately, producing those simulations requires high computational resources -- several hundred to thousand core-hours for each neutrino mass case. In this work, we propose a new method, based on a deep learning network, to quickly generate simulations with massive neutrinos from standard $\Lambda$CDM simulations without neutrinos. We computed multiple relevant statistical measures of deep-learning generated simulations, and conclude that our approach is an accurate alternative to the traditional N-body techniques. In particular the power spectrum is within $\simeq 6\%$ down to non-linear scales $k=0.7$~\rm h/Mpc. Finally, our method allows us to generate massive neutrino simulations 10,000 times faster than the traditional methods.


Learning from Topology: Cosmological Parameter Estimation from the Large-scale Structure

Yip, Jacky H. T., Rouhiainen, Adam, Shiu, Gary

arXiv.org Artificial Intelligence

The topology of the large-scale structure of the universe contains valuable information on the underlying cosmological parameters. While persistent homology can extract this topological information, the optimal method for parameter estimation from the tool remains an open question. To address this, we propose a neural network model to map persistence images to cosmological parameters. Through a parameter recovery test, we demonstrate that our model makes accurate and precise estimates, considerably outperforming conventional Bayesian inference approaches.


Learnable wavelet neural networks for cosmological inference

Pedersen, Christian, Eickenberg, Michael, Ho, Shirley

arXiv.org Artificial Intelligence

Convolutional neural networks (CNNs) have been shown to both extract more information than the traditional two-point statistics from cosmological fields, and marginalise over astrophysical effects extremely well. However, CNNs require large amounts of training data, which is potentially problematic in the domain of expensive cosmological simulations, and it is difficult to interpret the network. In this work we apply the learnable scattering transform, a kind of convolutional neural network that uses trainable wavelets as filters, to the problem of cosmological inference and marginalisation over astrophysical effects. We present two models based on the scattering transform, one constructed for performance, and one constructed for interpretability, and perform a comparison with a CNN. We find that scattering architectures are able to outperform a CNN, significantly in the case of small training data samples. Additionally we present a lightweight scattering network that is highly interpretable.


Robust Field-level Likelihood-free Inference with Galaxies

de Santi, Natalí S. M., Shao, Helen, Villaescusa-Navarro, Francisco, Abramo, L. Raul, Teyssier, Romain, Villanueva-Domingo, Pablo, Ni, Yueying, Anglés-Alcázar, Daniel, Genel, Shy, Hernandez-Martinez, Elena, Steinwandel, Ulrich P., Lovell, Christopher C., Dolag, Klaus, Castro, Tiago, Vogelsberger, Mark

arXiv.org Artificial Intelligence

We train graph neural networks to perform field-level likelihood-free inference using galaxy catalogs from state-of-the-art hydrodynamic simulations of the CAMELS project. Our models are rotational, translational, and permutation invariant and do not impose any cut on scale. From galaxy catalogs that only contain $3$D positions and radial velocities of $\sim 1, 000$ galaxies in tiny $(25~h^{-1}{\rm Mpc})^3$ volumes our models can infer the value of $\Omega_{\rm m}$ with approximately $12$ % precision. More importantly, by testing the models on galaxy catalogs from thousands of hydrodynamic simulations, each having a different efficiency of supernova and AGN feedback, run with five different codes and subgrid models - IllustrisTNG, SIMBA, Astrid, Magneticum, SWIFT-EAGLE -, we find that our models are robust to changes in astrophysics, subgrid physics, and subhalo/galaxy finder. Furthermore, we test our models on $1,024$ simulations that cover a vast region in parameter space - variations in $5$ cosmological and $23$ astrophysical parameters - finding that the model extrapolates really well. Our results indicate that the key to building a robust model is the use of both galaxy positions and velocities, suggesting that the network have likely learned an underlying physical relation that does not depend on galaxy formation and is valid on scales larger than $\sim10~h^{-1}{\rm kpc}$.


Learning cosmology and clustering with cosmic graphs

Villanueva-Domingo, Pablo, Villaescusa-Navarro, Francisco

arXiv.org Artificial Intelligence

We train deep learning models on thousands of galaxy catalogues from the state-of-the-art hydrodynamic simulations of the CAMELS project to perform regression and inference. We employ Graph Neural Networks (GNNs), architectures designed to work with irregular and sparse data, like the distribution of galaxies in the Universe. We first show that GNNs can learn to compute the power spectrum of galaxy catalogues with a few percent accuracy. We then train GNNs to perform likelihood-free inference at the galaxy-field level. Our models are able to infer the value of $\Omega_{\rm m}$ with a $\sim12\%-13\%$ accuracy just from the positions of $\sim1000$ galaxies in a volume of $(25~h^{-1}{\rm Mpc})^3$ at $z=0$ while accounting for astrophysical uncertainties as modelled in CAMELS. Incorporating information from galaxy properties, such as stellar mass, stellar metallicity, and stellar radius, increases the accuracy to $4\%-8\%$. Our models are built to be translational and rotational invariant, and they can extract information from any scale larger than the minimum distance between two galaxies. However, our models are not completely robust: testing on simulations run with a different subgrid physics than the ones used for training does not yield as accurate results.


Inferring halo masses with Graph Neural Networks

Villanueva-Domingo, Pablo, Villaescusa-Navarro, Francisco, Anglés-Alcázar, Daniel, Genel, Shy, Marinacci, Federico, Spergel, David N., Hernquist, Lars, Vogelsberger, Mark, Dave, Romeel, Narayanan, Desika

arXiv.org Artificial Intelligence

Understanding the halo-galaxy connection is fundamental in order to improve our knowledge on the nature and properties of dark matter. In this work we build a model that infers the mass of a halo given the positions, velocities, stellar masses, and radii of the galaxies it hosts. In order to capture information from correlations among galaxy properties and their phase-space, we use Graph Neural Networks (GNNs), that are designed to work with irregular and sparse data. We train our models on galaxies from more than 2,000 state-of-the-art simulations from the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project. Our model, that accounts for cosmological and astrophysical uncertainties, is able to constrain the masses of the halos with a $\sim$0.2 dex accuracy. Furthermore, a GNN trained on a suite of simulations is able to preserve part of its accuracy when tested on simulations run with a different code that utilizes a distinct subgrid physics model, showing the robustness of our method. The PyTorch Geometric implementation of the GNN is publicly available on Github at https://github.com/PabloVD/HaloGraphNet


The Cosmic Graph: Optimal Information Extraction from Large-Scale Structure using Catalogues

Makinen, T. Lucas, Charnock, Tom, Lemos, Pablo, Porqueres, Natalia, Heavens, Alan, Wandelt, Benjamin D.

arXiv.org Machine Learning

We present an implicit likelihood approach to quantifying cosmological information over discrete catalogue data, assembled as graphs. To do so, we explore cosmological parameter constraints using mock dark matter halo catalogues. We employ Information Maximising Neural Networks (IMNNs) to quantify Fisher information extraction as a function of graph representation. We a) demonstrate the high sensitivity of modular graph structure to the underlying cosmology in the noise-free limit, b) show that graph neural network summaries automatically combine mass and clustering information through comparisons to traditional statistics, c) demonstrate that networks can still extract information when catalogues are subject to noisy survey cuts, and d) illustrate how nonlinear IMNN summaries can be used as asymptotically optimal compressed statistics for Bayesian simulation-based inference. We reduce the area of joint $\Omega_m, \sigma_8$ parameter constraints with small ($\sim$100 object) halo catalogues by a factor of 42 over the two-point correlation function, and demonstrate that the networks automatically combine mass and clustering information. This work utilises a new IMNN implementation over graph data in Jax, which can take advantage of either numerical or auto-differentiability. We also show that graph IMNNs successfully compress simulations away from the fiducial model at which the network is fitted, indicating a promising alternative to n-point statistics in catalogue simulation-based analyses.


What Can We Learn About the Universe from Just One Galaxy?

The New Yorker

Imagine if you could look at a snowflake at the South Pole and determine the size and the climate of all of Antarctica. Or study a randomly selected tree in the Amazon rain forest and, from that one tree--be it rare or common, narrow or wide, young or old--deduce characteristics of the forest as a whole. Or, what if, by looking at one galaxy among the hundred billion or so in the observable universe, one could say something substantial about the universe as a whole? A recent paper, whose lead authors include a cosmologist, a galaxy-formation expert, and an undergraduate named Jupiter (who did the initial work), suggests that this may be the case. The result at first seemed "crazy" to the paper's authors.