Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization

Dongruo Zhou, Pan Xu, Quanquan Gu

Neural Information Processing Systems 

We study finite-sum nonconvex optimization problems, where the objective function is an average of n nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each iteration, our algorithm uses K +1nested reference points to build a semi-stochastic gradient to further reduce its variance in each iteration. For smooth nonconvex functions, the proposed algorithm converges to an -approximate first-order stationary point (i.e., krF (x)k