A Discrete Variational Derivation of Accelerated Methods in Optimization
Campos, Cédric M., Mahillo, Alejandro, de Diego, David Martín
–arXiv.org Artificial Intelligence
Many of the new developments in machine learning are connected with gradient-based optimization methods. Recently, these methods have been studied using a variational perspective (Betancourt et al., 2018). This has opened up the possibility of introducing variational and symplectic methods using geometric integration. In particular, in this paper, we introduce variational integrators (Marsden and West, 2001) which allow us to derive different methods for optimization. Using both, Hamilton's and Lagrange-d'Alembert's principle, we derive two families of optimization methods in one-to-one correspondence that generalize Polyak's heavy ball (Polyak, 1964) and Nesterov's accelerated gradient (Nesterov, 1983), the second of which mimics the behavior of the latter reducing the oscillations of classical momentum methods. However, since the systems considered are explicitly time-dependent, the preservation of symplecticity of autonomous systems occurs here solely on the fibers.
arXiv.org Artificial Intelligence
Nov-20-2022
- Country:
- South America > Brazil
- Rio de Janeiro > Rio de Janeiro (0.04)
- North America > United States
- New York (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- New Jersey
- Mercer County > Princeton (0.04)
- Bergen County > Hackensack (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Florida > Palm Beach County
- Boca Raton (0.04)
- California
- San Mateo County > Redwood City (0.04)
- Alameda County > Berkeley (0.04)
- Europe
- Spain
- Galicia > Madrid (0.04)
- Aragón > Zaragoza Province
- Zaragoza (0.04)
- Netherlands
- South Holland > Dordrecht (0.04)
- North Holland > Amsterdam (0.04)
- Spain
- Asia > Middle East
- Jordan (0.04)
- South America > Brazil
- Genre:
- Research Report (0.82)
- Technology: