Provably Faster Gradient Descent via Long Steps

Jul-20-2023–arXiv.org Artificial Intelligence

This work proposes a new analysis technique for gradient descent, establishing provably better convergence rates for smooth, convex optimization than the prior state-of-art textbook proofs. Our theory allows for nonconstant stepsize policies, periodically taking larger steps that may violate the monotone decrease in objective value typically needed by analysis. In fact, contrary to the common intuition, we show periodic long steps, which may increase the objective value in the short term, provably speed up convergence in the long term, with increasingly large gains as longer and longer steps are periodically included. This bears a similarity to accelerated momentum methods, which also depart from ensuring a monotone objective decrease at every iteration. Establishing this requires a proof technique capable of analyzing the overall effect of many iterations at once rather than the typical (naive) one-iteration inductions used in most first-order method analyses. Our proofs are based on the Performance Estimation Problem (PEP) ideas of [1-3], which cast computing/bounding the worst-case problem instance of a given algorithm as a Semidefinite Program (SDP). We show that the existence of a feasible solution to a related SDP proves a descent guarantee after applying a corresponding pattern of nonconstant stepsizes, from which faster convergence guarantees follow.

artificial intelligence, ld 2, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Jul-20-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- North America > United States
  - Massachusetts (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.68)
  - Machine Learning > Statistical Learning
    - Gradient Descent (0.73)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found