Reviews: Gradient Descent Can Take Exponential Time to Escape Saddle Points

Oct-8-2024, 13:28:15 GMT–Neural Information Processing Systems

It has recently been shown that, when all the saddle points of a non-convex function are "strict saddle", then gradient descent with a (reasonable) random initialization converges to a local minimizer with probability one. For a randomly perturbed version of gradient descent, the convergence rate can additionally be shown to be polynomial in the parameters. This article proves that such a convergence rate does not hold for the non-perturbed version: there exists reasonably smooth functions with only strict saddle points, and natural initialization schemes such that gradient descent requires a number of steps that is exponential in the dimension to find a local minimum. I liked this article very much. It answers a very natural question: gradient descent is an extremely classical, and very simple algorithm.

assumption, equation, gradient descent, (8 more...)

Neural Information Processing Systems

Oct-8-2024, 13:28:15 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)