AITopics | Wandelt, Benjamin

Collaborating Authors

Wandelt, Benjamin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

How many simulations do we need for simulation-based inference in cosmology?

Bairagi, Anirban, Wandelt, Benjamin, Villaescusa-Navarro, Francisco

arXiv.org Machine LearningMar-17-2025

How many simulations do we need to train machine learning methods to extract information available from summary statistics of the cosmological density field? Neural methods have shown the potential to extract non-linear information available from cosmological data. Success depends critically on having sufficient simulations for training the networks and appropriate network architectures. In the first detailed convergence study of neural network training for cosmological inference, we show that currently available simulation suites, such as the Quijote Latin Hypercube(LH) with 2000 simulations, do not provide sufficient training data for a generic neural network to reach the optimal regime, even for the dark matter power spectrum, and in an idealized case. We discover an empirical neural scaling law that predicts how much information a neural network can extract from a highly informative summary statistic, the dark matter power spectrum, as a function of the number of simulations used to train the network, for a wide range of architectures and hyperparameters. We combine this result with the Cramer-Rao information bound to forecast the number of training simulations needed for near-optimal information extraction. To verify our method we created the largest publicly released simulation data set in cosmology, the Big Sobol Sequence(BSQ), consisting of 32,768 $\Lambda$CDM n-body simulations uniformly covering the $\Lambda$CDM parameter space. Our method enables efficient planning of simulation campaigns for machine learning applications in cosmology, while the BSQ dataset provides an unprecedented resource for studying the convergence behavior of neural networks in cosmological parameter inference. Our results suggest that new large simulation suites or new training approaches will be necessary to achieve information-optimal parameter inference from non-linear simulations.

artificial intelligence, machine learning, simulation, (19 more...)

arXiv.org Machine Learning

2503.13755

Country: North America > United States (0.69)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

LtU-ILI: An All-in-One Framework for Implicit Inference in Astrophysics and Cosmology

Ho, Matthew, Bartlett, Deaglan J., Chartier, Nicolas, Cuesta-Lazaro, Carolina, Ding, Simon, Lapel, Axel, Lemos, Pablo, Lovell, Christopher C., Makinen, T. Lucas, Modi, Chirag, Pandya, Viraj, Pandey, Shivam, Perez, Lucia A., Wandelt, Benjamin, Bryan, Greg L.

arXiv.org Artificial IntelligenceFeb-6-2024

This paper presents the Learning the Universe Implicit Likelihood Inference (LtU-ILI) pipeline, a codebase for rapid, user-friendly, and cutting-edge machine learning (ML) inference in astrophysics and cosmology. The pipeline includes software for implementing various neural architectures, training schema, priors, and density estimators in a manner easily adaptable to any research workflow. It includes comprehensive validation metrics to assess posterior estimate coverage, enhancing the reliability of inferred results. Additionally, the pipeline is easily parallelizable, designed for efficient exploration of modeling hyperparameters. To demonstrate its capabilities, we present real applications across a range of astrophysics and cosmology problems, such as: estimating galaxy cluster masses from X-ray photometry; inferring cosmology from matter power spectra and halo point clouds; characterising progenitors in gravitational wave signals; capturing physical dust parameters from galaxy colors and luminosities; and establishing properties of semi-analytic models of galaxy formation. We also include exhaustive benchmarking and comparisons of all implemented methods as well as discussions about the challenges and pitfalls of ML inference in astronomical sciences. All code and examples are made publicly available at https://github.com/maho3/ltu-ili.

artificial intelligence, machine learning, posterior, (16 more...)

arXiv.org Artificial Intelligence

2402.05137

Country:

North America > United States (1.00)
Europe (0.92)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Optimal simulation-based Bayesian decisions

Alsing, Justin, Edwards, Thomas D. P., Wandelt, Benjamin

arXiv.org Machine LearningNov-9-2023

We present a framework for the efficient computation of optimal Bayesian decisions under intractable likelihoods, by learning a surrogate model for the expected utility (or its distribution) as a function of the action and data spaces. We leverage recent advances in simulation-based inference and Bayesian optimization to develop active learning schemes to choose where in parameter and action spaces to simulate. This allows us to learn the optimal action in as few simulations as possible. The resulting framework is extremely simulation efficient, typically requiring fewer model calls than the associated posterior inference task alone, and a factor of $100-1000$ more efficient than Monte-Carlo based methods. Our framework opens up new capabilities for performing Bayesian decision making, particularly in the previously challenging regime where likelihoods are intractable, and simulations expensive.

artificial intelligence, machine learning, simulation, (16 more...)

arXiv.org Machine Learning

2311.05742

Country: North America > United States > Maryland (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Information-Ordered Bottlenecks for Adaptive Semantic Compression

Ho, Matthew, Zhao, Xiaosheng, Wandelt, Benjamin

arXiv.org Artificial IntelligenceMay-18-2023

We present the information-ordered bottleneck (IOB), a neural layer designed to adaptively compress data into latent variables ordered by likelihood maximization. Without retraining, IOB nodes can be truncated at any bottleneck width, capturing the most crucial information in the first latent variables. Unifying several previous approaches, we show that IOBs achieve near-optimal compression for a given encoding architecture and can assign ordering to latent signals in a manner that is semantically meaningful. IOBs demonstrate a remarkable ability to compress embeddings of image and text data, leveraging the performance of SOTA architectures such as CNNs, transformers, and diffusion models. Moreover, we introduce a novel theory for estimating global intrinsic dimensionality with IOBs and show that they recover SOTA dimensionality estimates for complex synthetic data.

artificial intelligence, dimensionality, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2305.11213

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

The CAMELS project: public data release

Villaescusa-Navarro, Francisco, Genel, Shy, Anglés-Alcázar, Daniel, Perez, Lucia A., Villanueva-Domingo, Pablo, Wadekar, Digvijay, Shao, Helen, Mohammad, Faizan G., Hassan, Sultan, Moser, Emily, Lau, Erwin T., Valle, Luis Fernando Machado Poletti, Nicola, Andrina, Thiele, Leander, Jo, Yongseok, Philcox, Oliver H. E., Oppenheimer, Benjamin D., Tillman, Megan, Hahn, ChangHoon, Kaushal, Neerav, Pisani, Alice, Gebhardt, Matthew, Delgado, Ana Maria, Caliendo, Joyce, Kreisch, Christina, Wong, Kaze W. K., Coulton, William R., Eickenberg, Michael, Parimbelli, Gabriele, Ni, Yueying, Steinwandel, Ulrich P., La Torre, Valentina, Dave, Romeel, Battaglia, Nicholas, Nagai, Daisuke, Spergel, David N., Hernquist, Lars, Burkhart, Blakesley, Narayanan, Desika, Wandelt, Benjamin, Somerville, Rachel S., Bryan, Greg L., Viel, Matteo, Li, Yin, Irsic, Vid, Kraljic, Katarina, Vogelsberger, Mark

arXiv.org Artificial IntelligenceJan-4-2022

The Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project was developed to combine cosmology with astrophysics through thousands of cosmological hydrodynamic simulations and machine learning. CAMELS contains 4,233 cosmological simulations, 2,049 N-body and 2,184 state-of-the-art hydrodynamic simulations that sample a vast volume in parameter space. In this paper we present the CAMELS public data release, describing the characteristics of the CAMELS simulations and a variety of data products generated from them, including halo, subhalo, galaxy, and void catalogues, power spectra, bispectra, Lyman-$\alpha$ spectra, probability distribution functions, halo radial profiles, and X-rays photon lists. We also release over one thousand catalogues that contain billions of galaxies from CAMELS-SAM: a large collection of N-body simulations that have been combined with the Santa Cruz Semi-Analytic Model. We release all the data, comprising more than 350 terabytes and containing 143,922 snapshots, millions of halos, galaxies and summary statistics. We provide further technical details on how to access, download, read, and process the data at \url{https://camels.readthedocs.io}.

artificial intelligence, machine learning, simulation, (17 more...)

arXiv.org Artificial Intelligence

2201.013

Country:

North America > United States > New York (0.28)
North America > United States > Connecticut (0.28)
North America > United States > Massachusetts > Middlesex County (0.14)
(3 more...)

Genre: Research Report (0.44)

Industry: Energy (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback