Fundamental Benefit of Alternating Updates in Minimax Optimization

Lee, Jaewook, Cho, Hanseul, Yun, Chulhee

Feb-16-2024–arXiv.org Artificial Intelligence

The Gradient Descent-Ascent (GDA) algorithm, designed to solve minimax optimization problems, takes the descent and ascent steps either simultaneously (Sim-GDA) or alternately (Alt-GDA). While Alt-GDA is commonly observed to converge faster, the performance gap between the two is not yet well understood theoretically, especially in terms of global convergence rates. To address this theory-practice gap, we present fine-grained convergence analyses of both algorithms for strongly-convex-strongly-concave and Lipschitz-gradient objectives. Our new iteration complexity upper bound of Alt-GDA is strictly smaller than the lower bound of Sim-GDA; i.e., Alt-GDA is provably faster. Moreover, we propose Alternating-Extrapolation GDA (Alex-GDA), a general algorithmic framework that subsumes Sim-GDA and Alt-GDA, for which the main idea is to alternately take gradients from extrapolations of the iterates. We show that Alex-GDA satisfies a smaller iteration complexity bound, identical to that of the Extra-gradient method, while requiring less gradient computations. We also prove that Alex-GDA enjoys linear convergence for bilinear problems, for which both Sim-GDA and Alt-GDA fail to converge at all.

alternating update, fundamental benefit, iteration complexity, (12 more...)

arXiv.org Artificial Intelligence

Feb-16-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Russia (0.04)
- Europe
  - Russia (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.14)
  - Switzerland > Basel-City
    - Basel (0.04)
  - Italy > Sicily
    - Palermo (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning
    - Optimization (1.00)
    - Search (0.72)
  - Machine Learning > Statistical Learning
    - Gradient Descent (0.34)