AITopics

2605.1837

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence (0.69)

arXiv.org Machine LearningMay-20-2026

Gaussian Approximation and Multiplier Bootstrap for Federated Linear Stochastic Approximation

Levin, Ilya, Shuklin, Maksim, Moulines, Eric, Mangold, Paul, Samsonov, Sergey

In this paper, we establish Berry-Esseen-type bounds for federated linear stochastic approximation (LSA). Our results provide the first federated Gaussian approximations for LSA that explicitly capture communication-computation trade-offs and heterogeneity-aware error terms, quantifying the effects of local step size, number of local updates, and heterogeneity on convergence rates. We present results for both (i) constant step size regime and (ii) decreasing step size with an increasing number of local iterations, recovering the recent rates of Bonnerjee et al. [2025] as a special case. As a primary application of our results, we develop an online multiplier bootstrap procedure for inference on the last iterate, which avoids explicit estimation of the asymptotic covariance matrix, and obtain non-asymptotic validity guarantees for this procedure.

approximation, machine learning, reinforcement learning, (19 more...)

2605.19629

Country:

Europe (0.45)
North America > United States (0.27)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Neural Information Processing SystemsApr-25-2026, 17:13:18 GMT

Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks

We analyze the dynamics of finite width effects in wide but finite feature learning neural networks. Starting from a dynamical mean field theory description of infinite width deep neural network kernel and prediction dynamics, we provide a characterization of the O(1/ width) fluctuations of the DMFT order parameters over random initializations of the network weights. Our results, while perturbative in width, unlike prior analyses, are non-perturbative in the strength of feature learning. In the lazy limit of network training, all kernels are random but static in time and the prediction variance has a universal form. However, in the rich, feature learning regime, the fluctuations of the kernels and predictions are dynamically coupled with a variance that can be computed self-consistently.

artificial intelligence, machine learning, variance, (18 more...)

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting > Online (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Neural Information Processing SystemsApr-25-2026, 04:19:44 GMT

Limiting fluctuation and trajectorial stability of multilayer neural networks with mean field training

The mean field theory of multilayer neural networks centers around a particular infinite-width scaling, in which the learning dynamics is shown to be closely tracked by the mean field limit. A random fluctuation around this infinite-width limit is expected from a large-width expansion to the next order. This fluctuation has been studied only in the case of shallow networks, where previous works employ heavily technical notions or additional formulation ideas amenable only to that case. Treatment of the multilayer case has been missing, with the chief difficulty in finding a formulation that must capture the stochastic dependency across not only time but also depth. In this work, we initiate the study of the fluctuation in the case of multilayer networks, at any network depth.

artificial intelligence, machine learning, neural network, (16 more...)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningApr-15-2026

Loop Corrections to the Training and Generalization Errors of Random Feature Models

Kim, Taeyoung

We investigate random feature models in which neural networks sampled from a prescribed initialization ensemble are frozen and used as random features, with only the readout weights optimized. Adopting a statistical-physics viewpoint, we study the training, test, and generalization errors beyond the mean-kernel approximation. Since the predictor is a nonlinear functional of the induced random kernel, the ensemble-averaged errors depend not only on the mean kernel but also on higher-order fluctuation statistics. Within an effective field-theoretic framework, these finite-width contributions naturally appear as loop corrections. We derive the loop corrections to the training, test, and generalization errors, obtain their scaling laws, and support the theory with experimental verification.

artificial intelligence, correction, machine learning, (16 more...)

2604.12827

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Neural Information Processing SystemsMar-17-2026, 00:03:30 GMT

Mental Sampling in Multimodal Representations

Both resources in the natural environment and concepts in a semantic space are distributed patchily, with large gaps in between the patches. To describe people's internal and external foraging behavior, various random walk models have been proposed. In particular, internal foraging has been modeled as sampling: in order to gather relevant information for making a decision, people draw samples from a mental representation using random-walk algorithms such as Markov chain Monte Carlo (MCMC). However, two common empirical observations argue against people using simple sampling algorithms such as MCMC for internal foraging. First, the distance between samples is often best described by a Levy flight distribution: the probability of the distance between two successive locations follows a power-law on the distances.

artificial intelligence, machine learning, proceedings, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)

Neural Information Processing SystemsMar-16-2026, 18:27:36 GMT

Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

The performance of neural networks on high-dimensional data distributions suggests that it may be possible to parameterize a representation of a given high-dimensional function with controllably small errors, potentially outperforming standard interpolation methods. We demonstrate, both theoretically and numerically, that this is indeed the case. We map the parameters of a neural network to a system of particles relaxing with an interaction potential determined by the loss function. We show that in the limit that the number of parameters $n$ is large, the landscape of the mean-squared error becomes convex and the representation error in the function scales as $O(n^{-1})$. In this limit, we prove a dynamical variant of the universal approximation theorem showing that the optimal representation can be attained by stochastic gradient descent, the algorithm ubiquitously used for parameter optimization in machine learning. In the asymptotic regime, we study the fluctuations around the optimal representation and show that they arise at a scale $O(n^{-1})$. These fluctuations in the landscape identify the natural scale for the noise in stochastic gradient descent. Our results apply to both single and multi-layer neural networks, as well as standard kernel methods like radial basis functions.

artificial intelligence, machine learning, proceedings, (8 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningFeb-16-2026

Thermodynamic Isomorphism of Transformers: A Lagrangian Approach to Attention Dynamics

Kim, Gunn

We propose an effective field-theoretic framework for analyzing Transformer attention through a thermodynamic lens. By constructing a Lagrangian on the information manifold equipped with the Fisher metric, we show that, within the Shannon--Boltzmann entropy framework, the Softmax function arises as a stationary solution minimizing a Helmholtz free energy functional. This establishes a formal correspondence between scaled dot-product attention and canonical ensemble statistics. Extending this mapping to macroscopic observables, we define an effective specific heat associated with fluctuations of the attention energy landscape. In controlled experiments on the modular addition task ($p = 19$--$113$), we observe a robust peak in this fluctuation measure that consistently precedes the onset of generalization. While no asymptotic power-law divergence is detected in this finite-depth regime, the reproducible enhancement of energy variance suggests a critical-like crossover accompanying representational reorganization. Our framework provides a unified statistical-mechanical perspective on attention scaling, training dynamics, and positional encoding, interpreting the phenomena as emergent properties of an effective thermodynamic system rather than isolated heuristics. Although the present results indicate finite-size crossover behavior rather than a strict phase transition, they motivate further investigation into scaling limits of deep architectures through fluctuation-based observables.

artificial intelligence, machine learning, natural language, (20 more...)

2602.08216

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Neural Information Processing SystemsFeb-12-2026, 13:07:01 GMT

e6f29fb27bb400f89f5584c175005679-Paper-Conference.pdf

fluctuation, information rate, mutual information rate, (14 more...)

Country:

North America > United States (0.04)
Europe > Spain (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

Grant Rotskoff, Eric Vanden-Eijnden

Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

Neural Information Processing SystemsFeb-12-2026, 08:33:52 GMT

Theperformance ofneural networksonhigh-dimensional datadistributions suggests that it may be possible to parameterize a representation of agiven highdimensional function with controllably small errors, potentially outperforming standard interpolation methods. We demonstrate, both theoretically and numerically, that this is indeed the case. We map the parameters of a neural network to a system of particles relaxing with an interaction potential determined by the lossfunction.

artificial intelligence, arxiv, machine learning, (16 more...)

Country:

Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)