A Geometric Structure of Acceleration and Its Role in Making Gradients Small Fast
–Neural Information Processing Systems
Since Nesterov's seminal 1983 work, many accelerated first-order optimization methods have been proposed, but their analyses lacks a common unifying structure. In this work, we identify a geometric structure satisfied by a wide range of first-order accelerated methods. Using this geometric insight, we present several novel generalizations of accelerated methods. Most interesting among them is a method that reduces the squared gradient norm with \mathcal{O}(1/K 4) rate in the prox-grad setup, faster than the \mathcal{O}(1/K 3) rates of Nesterov's FGM or Kim and Fessler's FPGM-m.
Neural Information Processing Systems
Oct-10-2024, 19:28:59 GMT
- Technology: