A Unifying View on Implicit Bias in Training Linear Neural Networks

Yun, Chulhee, Krishnan, Shankar, Mobahi, Hossein

Oct-6-2020–arXiv.org Machine Learning

Overparametrized neural networks have infinitely many solutions that achieve zero training error, and such global minima have different generalization performance. Moreover, training a neural network is a high-dimensional nonconvex problem, which is typically intractable to solve. However, the success of deep learning indicates that first-order methods such as gradient descent or stochastic gradient descent (GD/SGD) not only (a) succeed in finding global minima, but also (b) are biased towards solutions that generalize well, which largely has remained a mystery in the literature. To explain part (a) of the phenomenon, there is a growing literature studying the convergence of GD/SGD on overparametrized neural networks (e.g., Du et al. (2018a,b); Allen-Zhu et al. (2018); Zou et al. (2018); Jacot et al. (2018); Oymak and Soltanolkotabi (2020), and many more). There are also convergence results that focus on linear networks, without nonlinear activations (Bartlett et al., 2018; Arora et al., 2019a; Wu et al., 2019; Du and Hu, 2019; Hu et al., 2020). These results typically focus on the convergence of loss, hence do not address which of the many global minima is reached.

converge, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

Oct-6-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.48)
  - Statistical Learning > Gradient Descent (0.74)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found