Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks

Li, Mingchen, Soltanolkotabi, Mahdi, Oymak, Samet

Apr-7-2019–arXiv.org Machine Learning

Deep neural networks (DNN) are ubiquitous in a growing number of domains ranging from computer vision to healthcare. State-of-the-art DNN models are typically overparameterized and contain more parameters than the size of the training dataset. It is well understood that in this overparameterized regime, DNNs are highly expressive and have the capacity to (over)fit arbitrary training datasets including pure noise [56]. Mysteriously however neural network models trained via simple algorithms such as stochastic gradient descent continue to predict well on yet unseen test data. In such over-parametrized scenarios there maybe infinitely many globally optimal network parameters consistent with the training data, the key challenge is to understand which network parameters (stochastic) gradient descent converges to and what are its properties. Indeed, a recent series of papers [16, 52, 56], suggest that solutions found by first order methods tend to have favorable generalization properties. As DNNs begin to be deployed in safety critical applications, the need for foundational understanding of their noise robustness and their unique prediction capabilities intensifies. This paper focuses on an intriguing phenomena: overparameterized neural networks are surprisingly robust to label noise when first order methods with early stopping is used to train them [25]. To observe this phenomena consider Figure 1 where we perform experiments on the MNIST data set.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Machine Learning

Apr-7-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States > California
  - Los Angeles County > Los Angeles (0.28)
  - Riverside County > Riverside (0.14)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Health & Medicine (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (1.00)
  - Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found