Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization
Zhou, Dongruo, Xu, Pan, Gu, Quanquan
–Neural Information Processing Systems
We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each epoch, our algorithm uses $K 1$ nested reference points to build an semi-stochastic gradient to further reduce its variance in each epoch. For smooth functions, the proposed algorithm converges to an approximate first order stationary point (i.e., $\ abla F(\xb)\ _2\leq \epsilon$) within $\tO(n\land \epsilon {-2} \epsilon {-3}\land n {1/2}\epsilon {-2})$\footnote{$\tO(\cdot)$ hides the logarithmic factors} number of stochastic gradient evaluations, where $n$ is the number of component functions, and $\epsilon$ is the optimization error. This improves the best known gradient complexity of SVRG $O(n n {2/3}\epsilon {-2})$ and the best gradient complexity of SCSG $O(\epsilon {-5/3}\land n {2/3}\epsilon {-2})$.
Neural Information Processing Systems
Feb-14-2020, 13:41:56 GMT
- Technology: