AITopics | dmft equation

We provide an overview of high dimensional dynamical systems driven by random matrices, focusing on applications to simple models of learning and generalization in machine learning theory. Using both cavity method arguments and path integrals, we review how the behavior of a coupled infinite dimensional system can be characterized as a stochastic process for each single site of the system. We provide a pedagogical treatment of dynamical mean field theory (DMFT), a framework that can be flexibly applied to these settings. The DMFT single site stochastic process is fully characterized by a set of (two-time) correlation and response functions. For linear time-invariant systems, we illustrate connections between random matrix resolvents and the DMFT response. We demonstrate applications of these ideas to machine learning models such as gradient flow, stochastic gradient descent on random feature models and deep linear networks in the feature learning regime trained on random data. We demonstrate how bias and variance decompositions (analysis of ensembling/bagging etc) can be computed by averaging over subsets of the DMFT noise variables. From our formalism we also investigate how linear systems driven with random non-Hermitian matrices (such as random feature models) can exhibit non-monotonic loss curves with training time, while Hermitian matrices with the matching spectra do not, highlighting a different mechanism for non-monotonicity than small eigenvalues causing instability to label noise. Lastly, we provide asymptotic descriptions of the training and test loss dynamics for randomly initialized deep linear neural networks trained in the feature learning regime with high-dimensional random data. In this case, the time translation invariance structure is lost and the hidden layer weights are characterized as spiked random matrices.

artificial intelligence, machine learning, response function, (18 more...)

arXiv.org Machine Learning

2601.0101

Country: North America > United States (0.27)

Genre: Overview (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

6c81c83c4bd0b58850495f603ab45a93-Supplemental.pdf

Neural Information Processing SystemsOct-3-2025, 04:21:54 GMT

artificial intelligence, equation, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe > France (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

6c81c83c4bd0b58850495f603ab45a93-Paper.pdf

Neural Information Processing SystemsOct-3-2025, 04:21:48 GMT

artificial intelligence, gradient descent, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Europe (0.68)
North America (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

We thank the reviewers (R1, R2, R3, R4

Neural Information Processing SystemsOct-3-2025, 04:21:37 GMT

Recent literature ( arXiv:1912.00018) investigated the behavior of SGD

artificial intelligence, batch size, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.32)

Add feedback

Precise Dynamics of Diagonal Linear Networks: A Unifying Analysis by Dynamical Mean-Field Theory

Nishiyama, Sota, Imaizumi, Masaaki

arXiv.org Machine LearningOct-3-2025

The training dynamics of neural networks have attracted significant attention in deep learning theory. It has been suggested that the dynamics induced by training algorithms strongly influence the generalization performance of neural networks. This effect is captured in the idea of implicit bias (Neyshabur et al., 2014), in which the algorithm selects a certain solution among many induced by nonconvexity of the loss and overparametrization of networks. Accordingly, many recent works have studied the interplay between models and optimizers, aiming to characterize the resulting implicit biases (Neyshabur, 2017; Soudry et al., 2018; Arora et al., 2019; Bartlett et al., 2021). Moreover, understanding the convergence speed and timescales of the training dynamics contributes to efficient training of high-performance models in practice, especially in the context of modern large-scale neural networks in which the training is stopped at a compute-optimal point (Kaplan et al., 2020).

dmft equation, equation, gradient flow, (13 more...)

arXiv.org Machine Learning

2510.0193

Country:

North America > United States (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problems

Neural Information Processing SystemsMar-3-2024, 06:01:36 GMT

The optimization step in many machine learning problems rarely relies on vanilla gradient descent but it is common practice to use momentum-based accelerated methods. Despite these algorithms being widely applied to arbitrary loss functions, their behaviour in generically non-convex, high dimensional landscapes is poorly understood. In this work, we use dynamical mean field theory techniques to describe analytically the average dynamics of these methods in a prototypical non-convex model: the (spiked) matrix-tensor model. We derive a closed set of equations that describe the behaviour of heavy-ball momentum and Nesterov acceleration in the infinite dimensional limit. By numerical integration of these equations, we observe that these methods speed up the dynamics but do not improve the algorithmic threshold with respect to gradient descent in the spiked model.

algorithm, equation, gradient descent, (13 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)

Add feedback

Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problems

Neural Information Processing SystemsMar-3-2024, 06:01:32 GMT

The optimization step in many machine learning problems rarely relies on vanilla gradient descent but it is common practice to use momentum-based accelerated methods. Despite these algorithms being widely applied to arbitrary loss functions, their behaviour in generically non-convex, high dimensional landscapes is poorly understood. In this work, we use dynamical mean field theory techniques to describe analytically the average dynamics of these methods in a prototypical non-convex model: the (spiked) matrix-tensor model. We derive a closed set of equations that describe the behaviour of heavy-ball momentum and Nesterov acceleration in the infinite dimensional limit. By numerical integration of these equations, we observe that these methods speed up the dynamics but do not improve the algorithmic threshold with respect to gradient descent in the spiked model.

algorithm, equation, gradient descent, (12 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Filters

Collaborating Authors

dmft equation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

019f8b946a256d9357eadc5ace2c8678-Supplemental.pdf

019f8b946a256d9357eadc5ace2c8678-Paper.pdf

6c81c83c4bd0b58850495f603ab45a93-Paper.pdf

Disordered Dynamics in High Dimensions: Connections to Random Matrices and Machine Learning

6c81c83c4bd0b58850495f603ab45a93-Supplemental.pdf

6c81c83c4bd0b58850495f603ab45a93-Paper.pdf

We thank the reviewers (R1, R2, R3, R4

Precise Dynamics of Diagonal Linear Networks: A Unifying Analysis by Dynamical Mean-Field Theory

Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problems

Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problems