Directional Smoothness and Gradient Methods: Convergence and Adaptivity
Mishkin, Aaron, Khaled, Ahmed, Wang, Yuanhao, Defazio, Aaron, Gower, Robert M.
–arXiv.org Artificial Intelligence
One way to avoid global smoothness of f is to use local Lipschitz continuity of the gradient ("local smoothness"). Local We develop new sub-optimality bounds for gradient smoothness uses different Lipschitz constants for different descent (GD) that depend on the conditioning neighbourhoods, thus avoiding global assumptions and obtaining of the objective along the path of optimization, improved rates. However, such analyses typically require rather than on global, worst-case constants. Key the iterates to be bounded, in which case local smoothness to our proofs is directional smoothness, a measure reduces to L-smoothness over a compact set (Malitsky of gradient variation that we use to develop upperbounds & Mishchenko, 2020). Boundedness can be enforced in a on the objective. Minimizing these upperbounds variety of ways: Zhang & Hong (2020) break optimization requires solving implicit equations to obtain into stages, Patel & Berahas (2022) develop a stopping-time a sequence of strongly adapted step-sizes; framework, and Lu & Mei (2023) use line-search and a modified we show that these equations are straightforward update. These approaches either modify the underlying to solve for convex quadratics and lead to new optimization algorithm, require local smoothness oracles guarantees for two classical step-sizes. For general (Park et al., 2021), or rely on highly complex arguments.
arXiv.org Artificial Intelligence
Mar-6-2024
- Country:
- North America > United States > California > Santa Clara County (0.14)
- Genre:
- Research Report (1.00)
- Technology: