AITopics | depth evolve

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Neural Information Processing SystemsDec-25-2025, 00:35:36 GMT

A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters. Furthermore, mirroring the correspondence between wide Bayesian neural networks and Gaussian processes, gradient-based training of wide neural networks with a squared loss produces test set predictions drawn from a Gaussian process with a particular compositional kernel. While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized version even for finite practically-sized networks. This agreement is robust across different architectures, optimization methods, and loss functions.

depth evolve, linear model, wide neural network, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback

Reviews: Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Neural Information Processing SystemsJan-21-2025, 12:27:27 GMT

The paper was proofread, well-structured, and very clear. The experiments were clearly described in detail, and provided relevant results. Below we outline some detailed comments of the results. In particular, Chizat and Bach prove that the training of an NTK parameterized network is closely modeled by "lazy training" (their terminology for a linearized model). This paper is not referenced in the related work section.

chizat and bach, modern network, wide neural network, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Reviews: Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Neural Information Processing SystemsJan-21-2025, 12:27:17 GMT

This paper studies deep neural networks in the regime where the layer widths grow to infinity. Its main contribution is to show that the dynamics of gradient descent for optimizing an infinite width neural network can be explained by the first-order Taylor expansion of the network around its initial parameters, given by the NTK of Jacot et al. Reviewers all agreed this is a valuable contribution which helps the current efforts on understanding the inner workings of gradient descent on large neural networks and its role with regards to generalisation. Despite some concerns about the applicability of this regime to explain the empirical performance of large deep nets and some concurrent work (Chizat and Bach), the authors successfully addressed these concerns in the rebuttal and therefore the AC recommends acceptance.

depth evolve, gradient descent, wide neural network, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)

Add feedback

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Neural Information Processing SystemsOct-9-2024, 12:22:18 GMT

A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters. Furthermore, mirroring the correspondence between wide Bayesian neural networks and Gaussian processes, gradient-based training of wide neural networks with a squared loss produces test set predictions drawn from a Gaussian process with a particular compositional kernel. While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized version even for finite practically-sized networks. This agreement is robust across different architectures, optimization methods, and loss functions.

gradient descent, neural network, wide neural network, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Lee, Jaehoon, Xiao, Lechao, Schoenholz, Samuel, Bahri, Yasaman, Novak, Roman, Sohl-Dickstein, Jascha, Pennington, Jeffrey

Neural Information Processing SystemsMar-19-2020, 00:03:13 GMT

A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters. Furthermore, mirroring the correspondence between wide Bayesian neural networks and Gaussian processes, gradient-based training of wide neural networks with a squared loss produces test set predictions drawn from a Gaussian process with a particular compositional kernel. While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized version even for finite practically-sized networks.

gradient descent, neural network, wide neural network, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Lee, Jaehoon, Xiao, Lechao, Schoenholz, Samuel S., Bahri, Yasaman, Sohl-Dickstein, Jascha, Pennington, Jeffrey

arXiv.org Machine LearningFeb-18-2019

A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters. Furthermore, mirroring the correspondence between wide Bayesian neural networks and Gaussian processes, gradient-based training of wide neural networks with a squared loss produces test set predictions drawn from a Gaussian process with a particular compositional kernel. While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized version even for finite practically-sized networks. This agreement is robust across different architectures, optimization methods, and loss functions.

gradient descent, lin, neural network, (14 more...)

arXiv.org Machine Learning

1902.0672

Country:

North America > United States > New York (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Collaborating Authors

depth evolve

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Reviews: Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Reviews: Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent