Overparameterization of deep ResNet: zero loss and mean-field analysis

Ding, Zhiyan, Chen, Shi, Li, Qin, Wright, Stephen

May-29-2021–arXiv.org Machine Learning

Finding parameters in a deep neural network (NN) that fit training data is a nonconvex optimization problem, but a basic first-order optimization method (gradient descent) finds a global optimizer with perfect fit (zero-loss) in many practical situations. We examine this phenomenon for the case of Residual Neural Networks (ResNet) with smooth activation functions in a limiting regime in which both the number of layers (depth) and the number of neurons in each layer (width) go to infinity. First, we use a mean-field-limit argument to prove that the gradient descent for parameter training becomes a gradient flow for a probability distribution that is characterized by a partial differential equation (PDE) in the large-NN limit. Next, we show that the solution to the PDE converges in the training time to a zero-loss solution. Together, these results imply that the training of the ResNet gives a near-zero loss if the ResNet is large enough. We give estimates of the depth and width needed to reduce the loss below a given threshold, with high probability.

deep learning, gradient descent, neural network, (18 more...)

arXiv.org Machine Learning

May-29-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre:
- Research Report (0.81)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.48)
  - Statistical Learning > Gradient Descent (0.55)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found