The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms

Collins-Woodfin, Elizabeth, Seroussi, Inbar, Malaxechebarría, Begoña García, Mackenzie, Andrew W., Paquette, Elliot, Paquette, Courtney

May-29-2024–arXiv.org Machine Learning

In deterministic optimization, adaptive stepsize strategies, such as line search (see [39], therein), AdaGrad-Norm [55], Polyak stepsize [46], and others were developed to provide stability and improve efficiency and adaptivity to unknown parameters. While the practical benefits for deterministic optimization problems are well-documented, much of our understanding of adaptive learning rate strategies for stochastic algorithms are still in their infancy. There are many adaptive learning rate strategies used in machine learning with many design goals. Some are known to adapt to SGD gradient noise while others are robust to hyper-parameters (e.g., [4, 59]). Theoretical results for adaptive algorithms tend to focus on guaranteeing minimax-optimal rates, but this theory is not engineered to provide realistic performance comparisons; indeed many adaptive algorithms are minimax-optimal, and so more precise statements are needed to distinguish them. For instance, the exact learning rates (or rate schedules) to which these strategies converge are unknown, nor their dependence on the geometry of the problem. Moreover, we often do not know how these adaptive stepsizes compare with well-tuned constant or decaying fixed learning rate stochastic gradient descent (SGD), which can be viewed as a cost associated with selecting the adaptive strategy in comparison to tuning by hand.

adagrad-norm, equation, lemma 7, (16 more...)

arXiv.org Machine Learning

May-29-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > New York (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Israel > Tel Aviv District > Tel Aviv (0.04)
- Africa > Middle East
  - Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Search (0.68)
  - Machine Learning
    - Statistical Learning > Gradient Descent (0.55)
    - Neural Networks > Deep Learning (0.45)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found