Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

Arora, Sanjeev, Du, Simon S., Hu, Wei, Li, Zhiyuan, Wang, Ruosong

Jan-24-2019–arXiv.org Machine Learning

The well-known work of Zhang et al. (2017) highlighted intriguing experimental phenomena about deep net training - specifically, optimization and generalization-and asked whether theory could explain them. They showed that sufficiently powerful nets (with vastly more parameters than number of training samples) can attain zero training error, regardless of whether the data is properly labeled or randomly labeled. Obviously, trainingwith randomly labeled data cannot generalize, whereas training with properly labeled data generalizes. See Figure 2 replicating some of these results. Recent papers have begun to provide explanations, showing that gradient descent can allow an overparametrized multi-layernet to attain arbitrarily low training error on fairly generic datasets (Du et al., 2018a,c; Li & Liang, 2018; Allen-Zhu et al., 2018b; Zou et al., 2018), provided the amount of overparametrization isa high polynomial of the relevant parameters (i.e.

arxiv preprint arxiv, deep learning, neural network, (14 more...)

arXiv.org Machine Learning

Jan-24-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.46)
  - Statistical Learning > Gradient Descent (0.36)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found