nag-sc
- Asia > Middle East > Jordan (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Asia > Middle East > Jordan (0.05)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > United States > Pennsylvania (0.04)
- (4 more...)
- Asia > Middle East > Jordan (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Asia > Middle East > Jordan (0.05)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > United States > Pennsylvania (0.04)
- (4 more...)
Quantitative Convergences of Lie Group Momentum Optimizers
Explicit, momentum-based dynamics that optimize functions defined on Lie groups can be constructed via variational optimization and momentum trivialization. Structure preserving time discretizations can then turn this dynamics into optimization algorithms. This article investigates two types of discretization, Lie Heavy-Ball, which is a known splitting scheme, and Lie NAG-SC, which is newly proposed. Their convergence rates are explicitly quantified under $L$-smoothness and local strong convexity assumptions. Lie NAG-SC provides acceleration over the momentumless case, i.e. Riemannian gradient descent, but Lie Heavy-Ball does not. When compared to existing accelerated optimizers for general manifolds, both Lie Heavy-Ball and Lie NAG-SC are computationally cheaper and easier to implement, thanks to their utilization of group structure. Only gradient oracle and exponential map are required, but not logarithm map or parallel transport which are computational costly.
- Asia > Middle East > Jordan (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
Understanding Accelerated Gradient Methods: Lyapunov Analyses and Hamiltonian Assisted Interpretations
We formulate two classes of first-order algorithms more general than previously studied for minimizing smooth and strongly convex or, respectively, smooth and convex functions. We establish sufficient conditions, via new discrete Lyapunov analyses, for achieving accelerated convergence rates which match Nesterov's methods in the strongly and general convex settings. Next, we study the convergence of limiting ordinary differential equations (ODEs) and point out currently notable gaps between the convergence properties of the corresponding algorithms and ODEs. Finally, we propose a novel class of discrete algorithms, called the Hamiltonian assisted gradient method, directly based on a Hamiltonian function and several interpretable operations, and then demonstrate meaningful and unified interpretations of our acceleration conditions.
- Asia > Middle East > Jordan (0.04)
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
Understanding the Acceleration Phenomenon via High-Resolution Differential Equations
Shi, Bin, Du, Simon S., Jordan, Michael I., Su, Weijie J.
Gradient-based optimization algorithms can be studied from the perspective of limiting ordinary differential equations (ODEs). Motivated by the fact that existing ODEs do not distinguish between two fundamentally different algorithms---Nesterov's accelerated gradient method for strongly convex functions (NAG-SC) and Polyak's heavy-ball method---we study an alternative limiting process that yields high-resolution ODEs. We show that these ODEs permit a general Lyapunov function framework for the analysis of convergence in both continuous and discrete time. We also show that these ODEs are more accurate surrogates for the underlying algorithms; in particular, they not only distinguish between NAG-SC and Polyak's heavy-ball method, but they allow the identification of a term that we refer to as "gradient correction" that is present in NAG-SC but not in the heavy-ball method and is responsible for the qualitative difference in convergence of the two methods. We also use the high-resolution ODE framework to study Nesterov's accelerated gradient method for (non-strongly) convex functions, uncovering a hitherto unknown result---that NAG-C minimizes the squared gradient norm at an inverse cubic rate. Finally, by modifying the high-resolution ODE of NAG-C, we obtain a family of new optimization methods that are shown to maintain the accelerated convergence rates of NAG-C for smooth convex functions.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Indonesia > Bali (0.04)