Quasi-hyperbolic momentum and Adam for deep learning

Oct-15-2018–arXiv.org Machine Learning

Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. We describe numerous connections to and identities with other algorithms, and we characterize the set of two-state optimization algorithms that QHM can recover. Finally, we propose a QH variant of Adam called QHAdam, and we empirically demonstrate that our algorithms lead to significantly improved training in a variety of settings, including a new state-of-the-art result on WMT16 EN-DE. We hope that these empirical results, combined with the conceptual and practical simplicity of QHM and QHAdam, will spur interest from both practitioners and researchers. PyTorch code is immediately available.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

Oct-15-2018

arXiv.org PDF

Add feedback

Country:
- Asia > Russia (0.04)
- Europe
  - France > Hauts-de-France
    - Nord > Lille (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - Portugal (0.04)
  - Russia (0.04)
- North America
  - Canada > Quebec
    - Montreal (0.04)
  - United States
    - California
      - Los Angeles County > Long Beach (0.04)
      - San Mateo County > Menlo Park (0.04)
    - Nevada (0.04)
    - Wisconsin > Milwaukee County
      - Milwaukee (0.04)
- Oceania > Australia
  - New South Wales > Sydney (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found