H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent

Nguyen, Son, Chen, Lizhang, Liu, Bo, Liu, Qiang

Jun-17-2024–arXiv.org Artificial Intelligence

Optimization algorithms play an indisputable role in the remarkable development of AI, especially in the realm of modern deep learning. In recent years, the emergence of breakthroughs in architectural innovation [3], as well as practical applications [37], has further promoted the necessity for embracing efficient training paradigms, which encompass optimization algorithms striking a balance between performance and manageable memory costs. Stochastic gradient descent (SGD) is widely regarded as the standard algorithm for training deep learning models, supported by extensive theoretical foundations [31, 32, 34, 43]. However, it requires thorough tuning of hyperparameters and frequently exhibits undesirable convergence rates when applied to many contemporary architectures [10, 36, 40]. Meanwhile, adaptive gradient methods such as Adam [17], AdaGrad [12], AMSGrad [29], etc., can adjust the learning rate for each parameter throughout the optimization process by utilizing cumulative second-order statistics.

algorithm, momentum, optimizer, (11 more...)

arXiv.org Artificial Intelligence

Jun-17-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Virginia (0.04)
  - Texas > Travis County
    - Austin (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found