Reviews: A Universally Optimal Multistage Accelerated Stochastic Gradient Method

Neural Information Processing Systems 

Originality: This paper provides a clear and deep analysis of a multi-stage accelerated SGD algorithm. The results show that the expected function value gap is bounded by an exponential decay term plus a sublinear decay term related to noise. They recover the deterministic case in the single stage and zero noise special case, while reaching the lower bound O(\sigma 2/n) in the noise term. The paper contains sufficient novel results and is competitive comparing with related work. In particular, the main results reveal how to choose the right time to switch from constant stepsize to decaying stepsize, a crucial choice for the overall performance of stochastic algorithms.