Goto

Collaborating Authors

 correlator


Fermions and Supersymmetry in Neural Network Field Theories

Frank, Samuel, Halverson, James, Maiti, Anindita, Ruehle, Fabian

arXiv.org Artificial Intelligence

We introduce fermionic neural network field theories via Grassmann-valued neural networks. Free theories are obtained by a generalization of the Central Limit Theorem to Grassmann variables. This enables the realization of the free Dirac spinor at infinite width and a four fermion interaction at finite width. Yukawa couplings are introduced by breaking the statistical independence of the output weights for the fermionic and bosonic fields. A large class of interacting supersymmetric quantum mechanics and field theory models are introduced by super-affine transformations on the input that realize a superspace formalism.


Viability of perturbative expansion for quantum field theories on neurons

Sen, Srimoyee, Vaidya, Varun

arXiv.org Artificial Intelligence

Accelerated progress in machine learning (ML) over the past decade has had significant impact across many research domains, including physics, and has motivated substantial interdisciplinary work. At the intersection of physics and machine learning, two prominent practical questions have emerged: 1. Can techniques from statistical mechanics and the path integral formulation of quantum field theory (QFT) help us build a theoretical understanding of how neural networks learn? 2. Can neural networks be used to facilitate computations in quantum field theory? These two questions are deeply interrelated, and will motivate the questions we explore in this work. The second question itself splits naturally into two subcategories: (a) applied machine learning for physics problems, and (b) the theoretical interplay between machine learning and QFT techniques. The area of applied ML to physics has already seen considerable progress.


Organ-Agents: Virtual Human Physiology Simulator via LLMs

Chang, Rihao, Jiao, He, Nie, Weizhi, Guo, Honglin, Xie, Keliang, Wu, Zhenhua, Zhao, Lina, Bai, Yunpeng, Ma, Yongtao, Wang, Lanjun, Su, Yuting, Gao, Xi, Wang, Weijie, Sebe, Nicu, Lepri, Bruno, Sun, Bingwei

arXiv.org Artificial Intelligence

Recent advances in large language models (LLMs) have enabled new possibilities in simulating complex physiological systems. We introduce Organ-Agents, a multi-agent framework that simulates human physiology via LLM-driven agents. Each Simulator models a specific system (e.g., cardiovascular, renal, immune). Training consists of supervised fine-tuning on system-specific time-series data, followed by reinforcement-guided coordination using dynamic reference selection and error correction. We curated data from 7,134 sepsis patients and 7,895 controls, generating high-resolution trajectories across 9 systems and 125 variables. Organ-Agents achieved high simulation accuracy on 4,509 held-out patients, with per-system MSEs <0.16 and robustness across SOFA-based severity strata. External validation on 22,689 ICU patients from two hospitals showed moderate degradation under distribution shifts with stable simulation. Organ-Agents faithfully reproduces critical multi-system events (e.g., hypotension, hyperlactatemia, hypoxemia) with coherent timing and phase progression. Evaluation by 15 critical care physicians confirmed realism and physiological plausibility (mean Likert ratings 3.9 and 3.7). Organ-Agents also enables counterfactual simulations under alternative sepsis treatment strategies, generating trajectories and APACHE II scores aligned with matched real-world patients. In downstream early warning tasks, classifiers trained on synthetic data showed minimal AUROC drops (<0.04), indicating preserved decision-relevant patterns. These results position Organ-Agents as a credible, interpretable, and generalizable digital twin for precision diagnosis, treatment simulation, and hypothesis testing in critical care.


Bridging the Dimensional Chasm: Uncover Layer-wise Dimensional Reduction in Transformers through Token Correlation

Song, Zhuo-Yang, Li, Zeyu, Cao, Qing-Hong, Luo, Ming-xing, Zhu, Hua Xing

arXiv.org Artificial Intelligence

The geometric evolution of token representations in large language models (LLMs) presents a fundamental paradox: while human language inherently organizes semantic information in low-dimensional spaces ($\sim 10^1$ dimensions), modern LLMs employ high-dimensional embeddings ($\sim 10^3$ dimensions) processed through Transformer architectures. To resolve this paradox, this work bridges this conceptual gap by developing a geometric framework that tracks token dynamics across Transformers layers. Through layer-wise analysis of intrinsic dimensions across multiple architectures, we reveal an expansion-contraction pattern where tokens diffuse to a "working space" and then progressively project onto lower-dimensional submanifolds. Our finding implies a negative correlation between the working space dimension and parameter-sensitive performance of the LLMs, and indicates that effective models tend to compress tokens into approximately 10-dimensional submanifolds, closely resembling human semantic spaces. This work not only advances LLM interpretability by reframing Transformers layers as projectors that mediate between high-dimensional computation and low-dimensional semantics, but also provides practical tools for model diagnostics that do not rely on task-specific evaluations.


Energy and polarization based on-line interference mitigation in radio interferometry

Yatawatta, Sarod, Boonstra, Albert-Jan, Broekema, Chris P.

arXiv.org Artificial Intelligence

Radio frequency interference (RFI) is a persistent contaminant in terrestrial radio astronomy. While new radio interferometers are becoming operational, novel sources of RFI are also emerging. In order to strengthen the mitigation of RFI in modern radio interferometers, we propose an on-line RFI mitigation scheme that can be run in the correlator of such interferometers. We combine statistics based on the energy as well as the polarization alignment of the correlated signal to develop an on-line RFI mitigation scheme that can be applied to a data stream produced by the correlator in real-time, especially targeted at low duty-cycle or transient RFI detection. In order to improve the computational efficiency, we explore the use of both single precision and half precision floating point operations in implementing the RFI mitigation algorithm. This ideally suits its deployment in accelerator computing devices such as graphics processing units (GPUs) as used by the LOFAR correlator. We provide results based on real data to demonstrate the efficacy of the proposed method.


Learning out-of-time-ordered correlators with classical kernel methods

Tanner, John, Pye, Jason, Wang, Jingbo

arXiv.org Artificial Intelligence

Out-of-Time Ordered Correlators (OTOCs) are widely used to investigate information scrambling in quantum systems. However, directly computing OTOCs with classical computers is often impractical. This is due to the need to simulate the dynamics of quantum many-body systems, which entails exponentially-scaling computational costs with system size. Similarly, exact simulation of the dynamics with a quantum computer (QC) will generally require a fault-tolerant QC, which is currently beyond technological capabilities. Therefore, alternative approaches are needed for computing OTOCs and related quantities. In this study, we explore four parameterised sets of Hamiltonians describing quantum systems of interest in condensed matter physics. For each set, we investigate whether classical kernel methods can accurately learn the XZ-OTOC as well as a particular sum of OTOCs, as functions of the Hamiltonian parameters. We frame the problem as a regression task, generating labelled data via an efficient numerical algorithm that utilises matrix product operators to simulate quantum many-body systems, with up to 40 qubits. Using this data, we train a variety of standard kernel machines and observe that the best kernels consistently achieve a high coefficient of determination ($R^2$) on the testing sets, typically between 0.9 and 0.99, and almost always exceeding 0.8. This demonstrates that classical kernels supplied with a moderate amount of training data can be used to closely and efficiently approximate OTOCs and related quantities for a diverse range of quantum many-body systems.


Why Rectified Power Unit Networks Fail and How to Improve It: An Effective Theory Perspective

Kim, Taeyoung, Kang, Myungjoo

arXiv.org Artificial Intelligence

The Rectified Power Unit (RePU) activation functions, unlike the Rectified Linear Unit (ReLU), have the advantage of being a differentiable function when constructing neural networks. However, it can be experimentally observed when deep layers are stacked, neural networks constructed with RePU encounter critical issues. These issues include the values exploding or vanishing and failure of training. And these happen regardless of the hyperparameter initialization. From the perspective of effective theory, we aim to identify the causes of this phenomenon and propose a new activation function that retains the advantages of RePU while overcoming its drawbacks.


TASI Lectures on Physics for Machine Learning

Halverson, Jim

arXiv.org Artificial Intelligence

These notes are based on lectures I gave at TASI 2024 on Physics for Machine Learning. The focus is on neural network theory, organized according to network expressivity, statistics, and dynamics. I present classic results such as the universal approximation theorem and neural network / Gaussian process correspondence, and also more recent results such as the neural tangent kernel, feature learning with the maximal update parameterization, and Kolmogorov-Arnold networks. The exposition on neural network theory emphasizes a field theoretic perspective familiar to theoretical physicists. I elaborate on connections between the two, including a neural network approach to field theory.


Leveraging neural control variates for enhanced precision in lattice field theory

Bedaque, Paulo F., Oh, Hyunwoo

arXiv.org Artificial Intelligence

Leveraging neural control variates for enhanced precision in lattice field theory Paulo F. Bedaque 1, and Hyunwoo Oh 1, 1 Department of Physics and Maryland Center for Fundamental Physics, University of Maryland, College Park, MD 20742 USA (Dated: May 10, 2024) Results obtained with stochastic methods have an inherent uncertainty due to the finite number of samples that can be achieved in practice. In lattice QCD this problem is particularly salient in some observables like, for instance, observables involving one or more baryons and it is the main problem preventing the calculation of nuclear forces from first principles. The method of control variables has been used extensively in statistics and it amounts to computing the expectation value of the difference between the observable of interest and another observable whose average is known to be zero but is correlated with the observable of interest. Recently, control variates methods emerged as a promising solution in the context of lattice field theories. In our current study, instead of relying on an educated guess to determine the control variate, we utilize a neural network to parametrize this function. Using 1+1 dimensional scalar field theory as a testbed, we demonstrate that this neural network approach yields substantial improvements.


Neural Network Field Theories: Non-Gaussianity, Actions, and Locality

Demirtas, Mehmet, Halverson, James, Maiti, Anindita, Schwartz, Matthew D., Stoner, Keegan

arXiv.org Artificial Intelligence

Both the path integral measure in field theory and ensembles of neural networks describe distributions over functions. When the central limit theorem can be applied in the infinite-width (infinite-$N$) limit, the ensemble of networks corresponds to a free field theory. Although an expansion in $1/N$ corresponds to interactions in the field theory, others, such as in a small breaking of the statistical independence of network parameters, can also lead to interacting theories. These other expansions can be advantageous over the $1/N$-expansion, for example by improved behavior with respect to the universal approximation theorem. Given the connected correlators of a field theory, one can systematically reconstruct the action order-by-order in the expansion parameter, using a new Feynman diagram prescription whose vertices are the connected correlators. This method is motivated by the Edgeworth expansion and allows one to derive actions for neural network field theories. Conversely, the correspondence allows one to engineer architectures realizing a given field theory by representing action deformations as deformations of neural network parameter densities. As an example, $\phi^4$ theory is realized as an infinite-$N$ neural network field theory.