Adaptive Step Sizes for Preconditioned Stochastic Gradient Descent
Köhne, Frederik, Kreis, Leonie, Schiela, Anton, Herzog, Roland
–arXiv.org Artificial Intelligence
This paper proposes a novel approach to adaptive step sizes in stochastic gradient descent (SGD) by utilizing quantities that we have identified as numerically traceable -- the Lipschitz constant for gradients and a concept of the local variance in search directions. Our findings yield a nearly hyperparameter-free algorithm for stochastic optimization, which has provable convergence properties when applied to quadratic problems and exhibits truly problem adaptive behavior on classical image classification tasks. Our framework enables the potential inclusion of a preconditioner, thereby enabling the implementation of adaptive step sizes for stochastic second-order optimization methods.
arXiv.org Artificial Intelligence
Nov-28-2023
- Country:
- North America
- United States
- New York (0.04)
- Florida > Palm Beach County
- Boca Raton (0.04)
- California > Santa Clara County
- Palo Alto (0.04)
- Canada > Ontario
- Toronto (0.04)
- United States
- Europe
- North America
- Genre:
- Research Report > New Finding (0.66)
- Technology: