The Two Phases of Gradient Descent in Deep Learning

May-12-2017, 23:12:10 GMT–@machinelearnbot

Thanks to great experimental work by several research groups studying the behavior of Stochastic Gradient Descent (SGD), we are collectively gaining a much clearer understanding as to what happens in the neighborhood of training convergence. The story begins with the best paper award winner for ICLR 2017, "Rethinking Generalization". This paper I first discussed several months ago in a blog post "Rethinking Generalization in Deep Learning". One interesting observation in that paper is the role of SGD. Indeed, in neural networks, we almost always choose our model as the output of running stochastic gradient descent.

artificial intelligence, deep learning, machine learning, (12 more...)

@machinelearnbot

May-12-2017, 23:12:10 GMT

News Web Page

Add feedback

Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)

Genre:
- Personal > Honors (0.56)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (1.00)
  - Neural Networks > Deep Learning (0.91)