Stabilized SVRG: Simple Variance Reduction for Nonconvex Optimization
Ge, Rong, Li, Zhize, Wang, Weiyao, Wang, Xiang
Nonconvex optimization is widely used in machine learning. Recently, for problems like matrix sensing (Bhojanapalli et al., 2016), matrix completion (Ge et al., 2016), and certain objectives for neural networks (Ge et al., 2017b), it was shown that all local minima are also globally optimal, therefore simple local search algorithms can be used to solve these problems. For a convex function f(x), a local and global minimum is achieved whenever the point has zero gradient: f(x) 0. However, for nonconvex functions, a point with zero gradient can also be a saddle point. To avoid converging to saddle points, recent results (Ge et al., 2015; Jin et al., 2017a,b) prove stronger results that show local search algorithms converge to ɛ-approximate second-order stationary points - points with small gradients and almost positive semi-definite Hessians (see Definition 1). In theory, Xu et al. (2018) and Allen-Zhu and Li (2017) independently showed that finding a second-order stationary point is not much harder than finding a first-order stationary point - they give reduction algorithms Neon/Neon2 that can converge to second-order stationary points when combined with algorithms that find first-order stationary points.
May-1-2019