Gradient Methods with Online Scaling
Gao, Wenzhi, Chu, Ya-Chi, Ye, Yinyu, Udell, Madeleine
–arXiv.org Artificial Intelligence
We introduce a framework to accelerate the convergence of gradient-based methods with online learning. The framework learns to scale the gradient at each iteration through an online learning algorithm and provably accelerates gradient-based methods asymptotically. In contrast with previous literature, where convergence is established based on worst-case analysis, our framework provides a strong convergence guarantee with respect to the optimal scaling matrix for the iteration trajectory. For smooth strongly convex optimization, our results provide an $O(\kappa^\star \log(1/\varepsilon)$) complexity result, where $\kappa^\star$ is the condition number achievable by the optimal preconditioner, improving on the previous $O(\sqrt{n}\kappa^\star \log(1/\varepsilon))$ result. In particular, a variant of our method achieves superlinear convergence on convex quadratics. For smooth convex optimization, we show for the first time that the widely-used hypergradient descent heuristic improves on the convergence of gradient descent.
arXiv.org Artificial Intelligence
Nov-5-2024
- Country:
- North America > United States
- California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.04)
- Asia > China
- North America > United States
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Education (0.69)
- Technology: