Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

Allen-Zhu, Zeyuan, Li, Yuanzhi, Liang, Yingyu

Nov-12-2018–arXiv.org Machine Learning

University of Wisconsin-Madison Abstract Neural networks have great success in many machine learning applications, but the fundamental learning theory behind them remains largely unsolved. Learning neural networks is NPhard, but in practice, simple algorithms like stochastic gradient descent(SGD) often produce good solutions. Moreover, it is observed that overparameterization-- designing networks whose number of parameters is larger than statistically needed to perfectly fit the data -- improves both optimization and generalization, appearing to contradict traditional learning theory. In this work, we extend the theoretical understanding of two and three-layer neural networks in the overparameterized regime. We prove that, using overparameterized neural networks, one can (improperly) learn some notable hypothesis classes, including two and three-layer neural networks with fewer parameters. Moreover, the learning process can be simply done by SGD or its variants in polynomial time using polynomially many samples. We also show that for a fixed sample size, the generalization error of the solution found by some SGD variant can be made almost independent of the number of parameters in the overparameterized network. Authors sorted in alphabetical order. 1 Introduction In contrast to the widely accepted empirical success, much less theory is known. Despite a recent increase of theoretical studies, many questions remain largely open, including fundamental ones about the optimization and generalization in learning neural networks.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Machine Learning

Nov-12-2018

arXiv.org PDF

Add feedback

Country:
- North America > United States > Wisconsin > Dane County > Madison (0.24)

Genre:
- Research Report (0.82)

Industry:
- Education (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (0.54)
  - Neural Networks > Deep Learning (0.45)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found