Gradient Methods with Online Scaling

Gao, Wenzhi, Chu, Ya-Chi, Ye, Yinyu, Udell, Madeleine

Nov-5-2024–arXiv.org Artificial Intelligence

We introduce a framework to accelerate the convergence of gradient-based methods with online learning. The framework learns to scale the gradient at each iteration through an online learning algorithm and provably accelerates gradient-based methods asymptotically. In contrast with previous literature, where convergence is established based on worst-case analysis, our framework provides a strong convergence guarantee with respect to the optimal scaling matrix for the iteration trajectory. For smooth strongly convex optimization, our results provide an $O(\kappa^\star \log(1/\varepsilon)$) complexity result, where $\kappa^\star$ is the condition number achievable by the optimal preconditioner, improving on the previous $O(\sqrt{n}\kappa^\star \log(1/\varepsilon))$ result. In particular, a variant of our method achieves superlinear convergence on convex quadratics. For smooth convex optimization, we show for the first time that the widely-used hypergradient descent heuristic improves on the convergence of gradient descent.

algorithm, convergence, gradient descent, (15 more...)

arXiv.org Artificial Intelligence

Nov-5-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.04)
- Asia > China
  - Shanghai > Shanghai (0.04)

Genre:
- Research Report > New Finding (0.48)

Industry:
- Education (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (1.00)
  - Machine Learning > Statistical Learning
    - Gradient Descent (1.00)