Goto

Collaborating Authors

 nullx 0




Nonlinear Acceleration of Stochastic Algorithms

Neural Information Processing Systems

Extrapolation methods use the last few iterates of an optimization algorithm to produce a better estimate of the optimum. They were shown to achieve optimal convergence rates in a deterministic setting using simple gradient iterates. Here, we study extrapolation methods in a stochastic setting, where the iterates are produced by either a simple or an accelerated stochastic gradient algorithm.




Supplementary Material

Neural Information Processing Systems

In this section, we fill in the missing details for proving Theorem 1, including a statement of the concentration bound used to establish Lemma 2, and a proof for Lemma 3. We first provide some


How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

Zeyuan Allen-Zhu

Neural Information Processing Systems

Stochastic gradient descent (SGD) gives an optimal convergence rate when minimizing convex stochastic objectives f (x). However, in terms of making the gradients small, the original SGD does not give an optimal rate, even when f (x) is convex.