AITopics | learning rate phase

Collaborating Authors

learning rate phase

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Implicit bias of deep linear networks in the large learning rate phase

Huang, Wei, Du, Weitao, Da Xu, Richard Yi, Liu, Chunrui

arXiv.org Machine LearningDec-16-2020

Most theoretical studies explaining the regularization effect in deep learning have only focused on gradient descent with a sufficient small learning rate or even gradient flow (infinitesimal learning rate). Such researches, however, have neglected a reasonably large learning rate applied in most practical applications. In this work, we characterize the implicit bias effect of deep linear networks for binary classification using the logistic loss in the large learning rate regime, inspired by the seminal work by Lewkowycz et al. [26] in a regression setting with squared loss. They found a learning rate regime with a large stepsize named the catapult phase, where the loss grows at the early stage of training and eventually converges to a minimum that is flatter than those found in the small learning rate regime. We claim that depending on the separation conditions of data, the gradient descent iterates will converge to a flatter minimum in the catapult phase. We rigorously prove this claim under the assumption of degenerate data by overcoming the difficulty of the non-constant Hessian of logistic loss and further characterize the behavior of loss and Hessian for non-separable data. Finally, we demonstrate that flatter minima in the space spanned by non-separable data along with the learning rate in the catapult phase can lead to better generalization empirically.

arxiv preprint arxiv, gradient descent, learning rate, (12 more...)

arXiv.org Machine Learning

2011.12547

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

The large learning rate phase of deep learning: the catapult mechanism

Lewkowycz, Aitor, Bahri, Yasaman, Dyer, Ethan, Sohl-Dickstein, Jascha, Gur-Ari, Guy

arXiv.org Machine LearningMar-4-2020

The choice of initial learning rate can have a profound effect on the performance of deep networks. We present a class of neural networks with solvable training dynamics, and confirm their predictions empirically in practical deep learning settings. The networks exhibit sharply distinct behaviors at small and large learning rates. The two regimes are separated by a phase transition. In the small learning rate phase, training can be understood using the existing theory of infinitely wide neural networks. At large learning rates the model captures qualitatively distinct phenomena, including the convergence of gradient descent dynamics to flatter minima. One key prediction of our model is a narrow range of large, stable learning rates. We find good agreement between our model's predictions and training dynamics in realistic deep learning settings. Furthermore, we find that the optimal performance in such settings is often found in the large learning rate phase. We believe our results shed light on characteristics of models trained at different learning rates. In particular, they fill a gap between existing wide neural network theory, and the nonlinear, large learning rate, training dynamics relevant to practice.

catapult phase, initialization, learning rate phase, (13 more...)

arXiv.org Machine Learning

2003.02218

Country: North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback