SGD Converges to Global Minimum in Deep Learning via Star-convex Path

Zhou, Yi, Yang, Junjie, Zhang, Huishuai, Liang, Yingbin, Tarokh, Vahid

Jan-2-2019–arXiv.org Machine Learning

Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a variety of deep neural networks. However, there is still a lack of understanding on how and why SGD can train these complex networks towards a global minimum. In this study, we establish the convergence of SGD to a global minimum for nonconvex optimization problems that are commonly encountered in neural network training. Our argument exploits the following two important properties: 1) the training loss can achieve zero value (approximately), which has been widely observed in deep learning; 2) SGD follows a star-convex path, which is verified by various experiments in this paper. In such a context, our analysis shows that SGD, although has long been considered as a randomized algorithm, converges in an intrinsically deterministic manner to a global minimum.

neural network, sgd, star-convex path, (15 more...)

arXiv.org Machine Learning

Jan-2-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Ohio (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - China (0.04)

Genre:
- Research Report > New Finding (0.48)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found