Near-Optimal Streaming Heavy-Tailed Statistical Estimation with Clipped SGD

Neural Information Processing Systems 

We cast this problem as stochastic convex optimization with heavy tailed stochastic gradients, and prove that the widely used Clipped-SGD algorithm attains near-optimal sub-Gaussian statistical rates whenever the second moment of the stochastic gradient noise is finite. Note that the fluctuations (depending on \tfrac{1}{\delta}) are of lower order than the term \Tr(\Sigma) .This improves upon the current best rate of \sqrt{\frac{\Tr(\Sigma)\ln(\tfrac{1}{\delta})}{T}} for Clipped-SGD, known \emph{only} for smooth and strongly convex objectives. Our results also extend to smooth convex and lipschitz convex objectives. Key to our result is a novel iterative refinement strategy for martingale concentration, improving upon the PAC-Bayes approach of \citet{catoni2018dimension}.