Universality of AdaGrad Stepsizes for Stochastic Optimization: Inexact Oracle, Acceleration and Variance Reduction

Neural Information Processing Systems 

Lipschitz gradient, without needing to know neither the corresponding Lipschitz constants, nor the oracle's variance but enjoying the rates which are characteristic for algorithms which have the