Aarts, Gert
Strategic White Paper on AI Infrastructure for Particle, Nuclear, and Astroparticle Physics: Insights from JENA and EuCAIF
Caron, Sascha, Ipp, Andreas, Aarts, Gert, Bíró, Gábor, Bonacorsi, Daniele, Cuoco, Elena, Doglioni, Caterina, Dorigo, Tommaso, Pardiñas, Julián García, Giagu, Stefano, Golling, Tobias, Heinrich, Lukas, Heng, Ik Siong, Isar, Paula Gina, Potamianos, Karolos, Teodorescu, Liliana, Veitch, John, Vischia, Pietro, Weniger, Christoph
Artificial intelligence (AI) is transforming scientific research, with deep learning methods playing a central role in data analysis, simulations, and signal detection across particle, nuclear, and astroparticle physics. Within the JENA communities-ECFA, NuPECC, and APPEC-and as part of the EuCAIF initiative, AI integration is advancing steadily. However, broader adoption remains constrained by challenges such as limited computational resources, a lack of expertise, and difficulties in transitioning from research and development (R&D) to production. This white paper provides a strategic roadmap, informed by a community survey, to address these barriers. It outlines critical infrastructure requirements, prioritizes training initiatives, and proposes funding strategies to scale AI capabilities across fundamental physics over the next five years.
Physics-Conditioned Diffusion Models for Lattice Gauge Theory
Zhu, Qianteng, Aarts, Gert, Wang, Wei, Zhou, Kai, Wang, Lingxiao
We develop diffusion models for simulating lattice gauge theories, where stochastic quantization is explicitly incorporated as a physical condition for sampling. We demonstrate the applicability of this novel sampler to U(1) gauge theory in two spacetime dimensions and find that a model trained at a small inverse coupling constant can be extrapolated to larger inverse coupling regions without encountering the topological freezing problem. Additionally, the trained model can be employed to sample configurations on different lattice sizes without requiring further training. The exactness of the generated samples is ensured by incorporating Metropolis-adjusted Langevin dynamics into the generation process. Furthermore, we demonstrate that this approach enables more efficient sampling of topological quantities compared to traditional algorithms such as Hybrid Monte Carlo and Langevin simulations.
Physics-Driven Learning for Inverse Problems in Quantum Chromodynamics
Aarts, Gert, Fukushima, Kenji, Hatsuda, Tetsuo, Ipp, Andreas, Shi, Shuzhe, Wang, Lingxiao, Zhou, Kai
The integration of deep learning techniques and physics-driven designs is reforming the way we address inverse problems, in which accurate physical properties are extracted from complex data sets. This is particularly relevant for quantum chromodynamics (QCD), the theory of strong interactions, with its inherent limitations in observational data and demanding computational approaches. This perspective highlights advances and potential of physics-driven learning methods, focusing on predictions of physical quantities towards QCD physics, and drawing connections to machine learning(ML). It is shown that the fusion of ML and physics can lead to more efficient and reliable problem-solving strategies. Key ideas of ML, methodology of embedding physics priors, and generative models as inverse modelling of physical probability distributions are introduced. Specific applications cover first-principle lattice calculations, and QCD physics of hadrons, neutron stars, and heavy-ion collisions. These examples provide a structured and concise overview of how incorporating prior knowledge such as symmetry, continuity and equations into deep learning designs can address diverse inverse problems across different physical sciences.
Random Matrix Theory for Stochastic Gradient Descent
Park, Chanju, Favoni, Matteo, Lucini, Biagio, Aarts, Gert
Machine learning (ML) and artificial intelligence (AI) can provide powerful tools for the scientific community, as demonstrated by the recent Nobel Prize in Chemistry. Reversely, insights from traditional physics theories also contribute to a deeper understanding of the mechanism of learning. Ref. [1] contains a broad overview of the successful cross-fertilisation between ML and the physical sciences, covering a number of domains. One way to mitigate against possible scepticism with regard to using ML as a "black box" is by unveiling the dynamics of training (or learning) and explaining how the relevant information is engraved in the model during the training stage. To further develop this programme, we study here the dynamics of first-order stochastic gradient descent as applied to weight matrices, reporting and expanding on the work presented in Ref. [2]. When training ML models, weight matrices are commonly updated by one of the variants of the stochastic gradient descent algorithm. The dynamics can then be decomposed into a drift and a fluctuating term, and such a system can be described by a discrete Langevin equation. The dynamics of stochastic matrix updates is richer than the dynamics for vector or scalar quantities, as captured by Dyson Brownian motion and random matrix theory (RMT), with the appearance of universal features for the eigenvalues [3-9]. Earlier descriptions of the statistical properties of weight matrices in terms of RMT can be found in e.g.
Diffusion models learn distributions generated by complex Langevin dynamics
Habibi, Diaa E., Aarts, Gert, Wang, Lingxiao, Zhou, Kai
The probability distribution effectively sampled by a complex Langevin process for theories with a sign problem is not known a priori and notoriously hard to understand. Diffusion models, a class of generative AI, can learn distributions from data. In this contribution, we explore the ability of diffusion models to learn the distributions created by a complex Langevin process.
Dyson Brownian motion and random matrix dynamics of weight matrices during learning
Aarts, Gert, Hajizadeh, Ouraman, Lucini, Biagio, Park, Chanju
During training, weight matrices in machine learning architectures are updated using stochastic gradient descent or variations thereof. In this contribution we employ concepts of random matrix theory to analyse the resulting stochastic matrix dynamics. We first demonstrate that the dynamics can generically be described using Dyson Brownian motion, leading to e.g. eigenvalue repulsion. The level of stochasticity is shown to depend on the ratio of the learning rate and the mini-batch size, explaining the empirically observed linear scaling rule. We verify this linear scaling in the restricted Boltzmann machine. Subsequently we study weight matrix dynamics in transformers (a nano-GPT), following the evolution from a Marchenko-Pastur distribution for eigenvalues at initialisation to a combination with additional structure at the end of learning.
On learning higher-order cumulants in diffusion models
Aarts, Gert, Habibi, Diaa E., Wang, Lingxiao, Zhou, Kai
To analyse how diffusion models learn correlations beyond Gaussian ones, we study the behaviour of higher-order cumulants, or connected n-point functions, under both the forward and backward process. We derive explicit expressions for the moment- and cumulant-generating functionals, in terms of the distribution of the initial data and properties of forward process. It is shown analytically that during the forward process higher-order cumulants are conserved in models without a drift, such as the variance-expanding scheme, and that therefore the endpoint of the forward process maintains nontrivial correlations. We demonstrate that since these correlations are encoded in the score function, higher-order cumulants are learnt in the backward process, also when starting from a normal prior. We confirm our analytical results in an exactly solvable toy model with nonzero cumulants and in scalar lattice field theory.
Generative Diffusion Models for Lattice Field Theory
Wang, Lingxiao, Aarts, Gert, Zhou, Kai
This study delves into the connection between machine learning and lattice field theory by linking generative diffusion models (DMs) with stochastic quantization, from a stochastic differential equation perspective. We show that DMs can be conceptualized by reversing a stochastic process driven by the Langevin equation, which then produces samples from an initial distribution to approximate the target distribution. In a toy model, we highlight the capability of DMs to learn effective actions. Furthermore, we demonstrate its feasibility to act as a global sampler for generating configurations in the two-dimensional $\phi^4$ quantum lattice field theory.
Diffusion Models as Stochastic Quantization in Lattice Field Theory
Wang, Lingxiao, Aarts, Gert, Zhou, Kai
In this work, we establish a direct connection between generative diffusion models (DMs) and stochastic quantization (SQ). The DM is realized by approximating the reversal of a stochastic process dictated by the Langevin equation, generating samples from a prior distribution to effectively mimic the target distribution. Using numerical simulations, we demonstrate that the DM can serve as a global sampler for generating quantum lattice field configurations in two-dimensional $\phi^4$ theory. We demonstrate that DMs can notably reduce autocorrelation times in the Markov chain, especially in the critical region where standard Markov Chain Monte-Carlo (MCMC) algorithms experience critical slowing down. The findings can potentially inspire further advancements in lattice field theory simulations, in particular in cases where it is expensive to generate large ensembles.
Towards a Shapley Value Graph Framework for Medical peer-influence
Duell, Jamie, Seisenberger, Monika, Aarts, Gert, Zhou, Shangming, Fan, Xiuyi
Explainable Artificial Intelligence (XAI) is at the forefront of Artificial Intelligence (AI) research with a variety of techniques and libraries coming to fruition in recent years, e.g., model agnostic explanations [1, 2], counter-factual explanations [3, 4], contrastive explanations [5] and argumentation-based explanations [6, 7]. XAI methods are ubiquitous across fields of Machine Learning (ML), where the trust factor associated with applied ML is undermined due to the black-box nature of methods. Generally speaking, a ML model takes a set of inputs (features) and predicts some output; and existing works on XAI predominantly focus on understanding relations between features and output. These approaches in XAI are successful in many areas as they suggest how an output of a model might change, should we change its inputs. Thus, interventions - manipulating inputs in specific ways with the hope of reaching some desired outcome - can be provoked using existing XAI methods when they are capable of providing relatively accurate explanations [8, 9]. However, with existing XAI holding little knowledge to consequences of interventions [10], such intervention could be susceptible to error. From both a business and ethical stand-point, we must reach beyond understanding relations between features and their outputs; we also need to understand the influence that features have on one another. We believe such knowledge holds the key to deeper understanding of model behaviours and identification of suitable interventions.