Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Functions

Oct-10-2024, 21:36:50 GMT–Neural Information Processing Systems

We study the performance of Stochastic Cubic Regularized Newton (SCRN) on a class of functions satisfying gradient dominance property with 1\le\alpha\le2 which holds in a wide range of applications in machine learning and signal processing. This condition ensures that any first-order stationary point is a global optimum. SCRN improves the best-known sample complexity of stochastic gradient descent. Even under a weak version of gradient dominance property, which is applicable to policy-based reinforcement learning (RL), SCRN achieves the same improvement over stochastic policy gradient methods. Additionally, we show that the average sample complexity of SCRN can be reduced to {\mathcal{O}}(\epsilon {-2}) for \alpha 1 using a variance reduction method with time-varying batch sizes.

best-known sample complexity, gradient dominance property, gradient-dominated function, (3 more...)

Neural Information Processing Systems

Oct-10-2024, 21:36:50 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.63)