Acceleration of stochastic gradient descent with momentum by averaging: finite-sample rates and asymptotic normality

Tang, Kejie, Liu, Weidong, Zhang, Yichen

May-28-2023–arXiv.org Artificial Intelligence

SGD is a first-order optimization algorithm that approximates the expected loss by averaging the loss function over a mini-batch of training examples. At each iteration, the algorithm updates the model parameters in the direction of the negative gradient of the mini-batch loss, scaled by a learning rate parameter. While SGD is simple and easy to implement, it may suffer from slow convergence rates or oscillations in high-dimensional optimization problems, particularly when the loss function is illconditioned or noisy. Momentum-based methods enhance SGD by introducing an exponentially weighted moving average of the past gradients to the update rule, which serves to dampen oscillations and accelerate convergence. In particular, the momentum term introduces a form of inertia to the update process, allowing the algorithm to maintain a more consistent direction of movement even in the presence of noisy gradients. Several variants of momentum-based SGD have been proposed, such as Nesterov's accelerated gradient (NAG), Adagrad, and Adam, each with its own strengths and weaknesses.

artificial intelligence, machine learning, sgdm, (13 more...)

arXiv.org Artificial Intelligence

May-28-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Russia (0.04)
- North America > United States
  - Indiana > Tippecanoe County
    - West Lafayette (0.04)
    - Lafayette (0.04)
- Asia
  - Russia (0.04)
  - China > Shanghai
    - Shanghai (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (1.00)
  - Machine Learning > Statistical Learning
    - Gradient Descent (0.88)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found