The Two Phases of Gradient Descent in Deep Learning

May-20-2017, 21:20:28 GMT–@machinelearnbot

Thanks to great experimental work by several research groups studying the behavior of Stochastic Gradient Descent (SGD), we are collectively gaining a much clearer understanding as to what happens in the neighborhood of training convergence. This paper I first discussed several months ago in a blog post "Rethinking Generalization in Deep Learning". Leslie Smith and Nicholay Topin, recently submitted a workshop paper to the ICLR 2017 workshop: "Exploring Loss Function Topology with Cyclic Learning Rate" where they discover some peculiar convergence behavior: Here, as you monotonically increase and decrease the learning rate, there is a transition near at the convergence regime that a large enough learning rate perturbs the system right off is basin into a space of much higher loss. There is however one pragmatic take away from this paper "Averaging two models within a basin tend to give a error that is the average of the two models (or less).Averaging two models between basins tend to give an error that is higher than both models".

deep learning, deep learning, neural network, (16 more...)

@machinelearnbot

May-20-2017, 21:20:28 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)