Global convergence of neuron birth-death dynamics

Rotskoff, Grant, Jelassi, Samy, Bruna, Joan, Vanden-Eijnden, Eric

Feb-5-2019–arXiv.org Machine Learning

As a consequence of the universal approximation theorems, sufficiently wide single layer neural networks are expressive enough to accurately represent a broad class of functions [Cyb89, Bar93, PS91]. The existence of a neural network function arbitrarily close to a given target function, however, is not a guarantee that any particular optimization procedure can identify the optimal parameters. Recently, using mathematical tools from optimal transport theory and interacting particle systems, it was shown that gradient descent [RVE18b, MMN18, SS18, CB18b] and stochastic gradient descent converge asymptotically to the target function in the large data limit. This analysis relies on taking a "mean-field" limit in which the number of parameters n tends to infinity. In this setting, gradient descent optimization dynamics is described by a partial differential equation (PDE), corresponding to a Wasserstein gradient flow on a convex energy functional. While this PDE provides a powerful conceptual framework for analyzing the properties of neural networks evolving under gradient descent dynamics, the formula confers few immediate practical advantages.

artificial intelligence, convergence, machine learning, (15 more...)

arXiv.org Machine Learning

Feb-5-2019

arXiv.org PDF

Add feedback

Country:
- Europe > Switzerland (0.04)
- North America > United States
  - New York (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Statistical Learning > Gradient Descent (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found