A Geometric Structure of Acceleration and Its Role in Making Gradients Small Fast

Oct-10-2024, 19:28:59 GMT–Neural Information Processing Systems

Since Nesterov's seminal 1983 work, many accelerated first-order optimization methods have been proposed, but their analyses lacks a common unifying structure. In this work, we identify a geometric structure satisfied by a wide range of first-order accelerated methods. Using this geometric insight, we present several novel generalizations of accelerated methods. Most interesting among them is a method that reduces the squared gradient norm with \mathcal{O}(1/K 4) rate in the prox-grad setup, faster than the \mathcal{O}(1/K 3) rates of Nesterov's FGM or Kim and Fessler's FPGM-m.

accelerated method, geometric structure, gradient small fast, (2 more...)

Neural Information Processing Systems

Oct-10-2024, 19:28:59 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.51)