Reviews: A Universally Optimal Multistage Accelerated Stochastic Gradient Method

Neural Information Processing Systems 

This paper designs a multistage SGD algorithm that does not need to know noise and optimality gap at initialization and yet obtain optimal convergence rates. This is a well written paper with good results.