Global convergence of neuron birth-death dynamics
Rotskoff, Grant, Jelassi, Samy, Bruna, Joan, Vanden-Eijnden, Eric
As a consequence of the universal approximation theorems, sufficiently wide single layer neural networks are expressive enough to accurately represent a broad class of functions [Cyb89, Bar93, PS91]. The existence of a neural network function arbitrarily close to a given target function, however, is not a guarantee that any particular optimization procedure can identify the optimal parameters. Recently, using mathematical tools from optimal transport theory and interacting particle systems, it was shown that gradient descent [RVE18b, MMN18, SS18, CB18b] and stochastic gradient descent converge asymptotically to the target function in the large data limit. This analysis relies on taking a "mean-field" limit in which the number of parameters n tends to infinity. In this setting, gradient descent optimization dynamics is described by a partial differential equation (PDE), corresponding to a Wasserstein gradient flow on a convex energy functional. While this PDE provides a powerful conceptual framework for analyzing the properties of neural networks evolving under gradient descent dynamics, the formula confers few immediate practical advantages.
Feb-5-2019
- Country:
- Europe > Switzerland (0.04)
- North America > United States
- New York (0.04)
- Genre:
- Research Report (0.50)
- Technology: