Quasi-hyperbolic momentum and Adam for deep learning
Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. We describe numerous connections to and identities with other algorithms, and we characterize the set of two-state optimization algorithms that QHM can recover. Finally, we propose a QH variant of Adam called QHAdam, and we empirically demonstrate that our algorithms lead to significantly improved training in a variety of settings, including a new state-of-the-art result on WMT16 EN-DE. We hope that these empirical results, combined with the conceptual and practical simplicity of QHM and QHAdam, will spur interest from both practitioners and researchers. PyTorch code is immediately available.
Oct-15-2018
- Country:
- Asia > Russia (0.04)
- Europe
- France > Hauts-de-France
- Netherlands > North Holland
- Amsterdam (0.04)
- Portugal (0.04)
- Russia (0.04)
- North America
- Canada > Quebec
- Montreal (0.04)
- United States
- California
- Los Angeles County > Long Beach (0.04)
- San Mateo County > Menlo Park (0.04)
- Nevada (0.04)
- Wisconsin > Milwaukee County
- Milwaukee (0.04)
- California
- Canada > Quebec
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Genre:
- Research Report (0.82)
- Technology: