Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization

Feb-14-2020, 13:41:56 GMT–Neural Information Processing Systems

We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each epoch, our algorithm uses $K 1$ nested reference points to build an semi-stochastic gradient to further reduce its variance in each epoch. For smooth functions, the proposed algorithm converges to an approximate first order stationary point (i.e., $\ abla F(\xb)\ _2\leq \epsilon$) within $\tO(n\land \epsilon {-2} \epsilon {-3}\land n {1/2}\epsilon {-2})$\footnote{$\tO(\cdot)$ hides the logarithmic factors} number of stochastic gradient evaluations, where $n$ is the number of component functions, and $\epsilon$ is the optimization error. This improves the best known gradient complexity of SVRG $O(n n {2/3}\epsilon {-2})$ and the best gradient complexity of SCSG $O(\epsilon {-5/3}\land n {2/3}\epsilon {-2})$.

artificial intelligence, gradient complexity, machine learning, (11 more...)

Neural Information Processing Systems

Feb-14-2020, 13:41:56 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning
    - Gradient Descent (1.00)
  - Representation & Reasoning (1.00)