Gradient Descent Converges to Minimizers
Lee, Jason D., Simchowitz, Max, Jordan, Michael I., Recht, Benjamin
Saddle points have long been regarded as a tremendous obstacle for continuous optimization. There are many well known examples when worst case initialization of gradient descent provably converge to saddle points [20, Section 1.2.3], and hardness results which show that finding even a local minimizer of nonconvex functions is NP-Hard in the worst case [19]. However, such worst-case analyses have not daunted practitioners, and high quality solutions of continuous optimization problems are readily found by a variety of simple algorithms. Building on tools from the theory of dynamical systems, this paper demonstrates that, under very mild regularity conditions, saddle points are indeed of little concern for the gradient method.
Mar-4-2016