Global convergence of neuron birth-death dynamics

Rotskoff, Grant, Jelassi, Samy, Bruna, Joan, Vanden-Eijnden, Eric

arXiv.org Machine Learning 

As a consequence of the universal approximation theorems, sufficiently wide single layer neural networks are expressive enough to accurately represent a broad class of functions [Cyb89, Bar93, PS91]. The existence of a neural network function arbitrarily close to a given target function, however, is not a guarantee that any particular optimization procedure can identify the optimal parameters. Recently, using mathematical tools from optimal transport theory and interacting particle systems, it was shown that gradient descent [RVE18b, MMN18, SS18, CB18b] and stochastic gradient descent converge asymptotically to the target function in the large data limit. This analysis relies on taking a "mean-field" limit in which the number of parameters n tends to infinity. In this setting, gradient descent optimization dynamics is described by a partial differential equation (PDE), corresponding to a Wasserstein gradient flow on a convex energy functional. While this PDE provides a powerful conceptual framework for analyzing the properties of neural networks evolving under gradient descent dynamics, the formula confers few immediate practical advantages.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found