Provable Convergence of Nesterov Accelerated Method for Over-Parameterized Neural Networks

Jul-5-2021–arXiv.org Artificial Intelligence

Despite the empirical success of deep learning, it still lacks theoretical understandings to explain why randomly initialized neural network trained by first-order optimization methods is able to achieve zero training loss, even though its landscape is non-convex and non-smooth. Recently, there are some works to demystifies this phenomenon under over-parameterized regime. In this work, we make further progress on this area by considering a commonly used momentum optimization algorithm: Nesterov accelerated method (NAG). We analyze the convergence of NAG for two-layer fully connected neural network with ReLU activation. Specifically, we prove that the error of NAG converges to zero at a linear convergence rate $1-\Theta(1/\sqrt{\kappa})$, where $\kappa > 1$ is determined by the initialization and the architecture of neural network. Comparing to the rate $1-\Theta(1/\kappa)$ of gradient descent, NAG achieves an acceleration. Besides, it also validates NAG and Heavy-ball method can achieve a similar convergence rate.

deep learning, neural network, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

Jul-5-2021

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.14)
- North America > Canada (0.14)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.34)
  - Statistical Learning > Gradient Descent (0.36)