Sharp Analysis of Stochastic Optimization under Global Kurdyka-Lojasiewicz Inequality

Oct-11-2024, 09:11:53 GMT–Neural Information Processing Systems

We study the complexity of finding the global solution to stochastic nonconvex optimization when the objective function satisfies global Kurdyka-{\L}ojasiewicz (KL) inequality and the queries from stochastic gradient oracles satisfy mild expected smoothness assumption. We first introduce a general framework to analyze Stochastic Gradient Descent (SGD) and its associated nonlinear dynamics under the setting. As a byproduct of our analysis, we obtain a sample complexity of \mathcal{O}(\epsilon {-(4-\alpha)/\alpha}) for SGD when the objective satisfies the so called \alpha -P{\L} condition, where \alpha is the degree of gradient domination. Furthermore, we show that a modified SGD with variance reduction and restarting (PAGER) achieves an improved sample complexity of \mathcal{O}(\epsilon {-2/\alpha}) when the objective satisfies the average smoothness assumption. This leads to the first optimal algorithm for the important case of \alpha 1 which appears in applications such as policy optimization in reinforcement learning.

global kurdyka-lojasiewicz inequality, sharp analysis, stochastic optimization, (6 more...)

Neural Information Processing Systems

Oct-11-2024, 09:11:53 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Mathematical & Statistical Methods (0.91)
  - Machine Learning > Statistical Learning
    - Gradient Descent (0.91)