Neural networks as Interacting Particle Systems: Asymptotic convexity of the Loss Landscape and Universal Scaling of the Approximation Error

Rotskoff, Grant M., Vanden-Eijnden, Eric

arXiv.org Machine Learning 

These successes evince an ability to accurately represent high dimensional functions, potentially of great use in computational and applied mathematics. That said, there are few rigorous results about the representation error and trainability of neural networks, as well as how they scale with the network size. Here we characterize both the error and scaling by reinterpreting the standard optimization algorithm used in machine learning applications, stochastic gradient descent, as the evolution of a particle system with interactions governed by a potential related to the objective or "loss" function used to train the network. We show that, when the number n of parameters is large, the empirical distribution of the particles descends on a convex landscape towards a minimizer at a rate independent of n . Our analysis also quantifies the scale and nature of the noise introduced by stochastic gradient descent and provides guidelines for the step size and batch size to use when training a neural network. We illustrate our findings on examples in which we train neural network to learn the energy function of the continuous 3-spin model on the sphere. The approximation error scales as our analysis predicts in as high a dimension as d 25. Finite training set and stochastic gradient descent 6 1.6. Central Limit Theorem (CL T) 13 3. Finite training set and stochastic gradient descent (SGD) 14 3.1. We thank Weinan E for discussions about the approximation error of neural networks and Sylvia Serfaty for her insights about interacting particle systems. M OTIVATION AND MAIN RESULTS While classification problems continue to be an active area research, extraordinary progress has been made on both speech and image recognition, problems that appeared intractable only a decade ago [1]. By harvesting the power of neural networks while simultaneously benefiting from advances in computational hardware, complex tasks such as automatic language translation are now routinely performed by computers with a high degree of reliability . The underlying explanation for these significant advances seems to be related to the expressive power of neural networks, and their ability to represent high dimensional functions with accuracy . These successes open exciting possibilities in applied and computational mathematics that are only beginning to be explored [2-8].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found