L4: Practical loss-based stepsize adaptation for deep learning

Feb-21-2018–arXiv.org Machine Learning

We propose a stepsize adaptation scheme for stochastic gradient descent. It operates directly with the loss function and rescales the gradient in order to make fixed predicted progress on the loss. We demonstrate its capabilities by strongly improving the performance of Adam and Momentum optimizers. The enhanced optimizers with default hyperparameters consistently outperform their constant stepsize counterparts, even the best ones, without a measurable increase in computational cost. The performance is validated on multiple architectures including ResNets and the Differential Neural Computer. A prototype implementation as a TensorFlow optimizer is released.

artificial intelligence, learning rate, machine learning, (17 more...)

arXiv.org Machine Learning

Feb-21-2018

arXiv.org PDF

Add feedback

Country:
- Europe > Germany (0.28)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found