Near-Optimal Streaming Heavy-Tailed Statistical Estimation with Clipped SGD

May-26-2025, 16:28:13 GMT–Neural Information Processing Systems

We cast this problem as stochastic convex optimization with heavy tailed stochastic gradients, and prove that the widely used Clipped-SGD algorithm attains near-optimal sub-Gaussian statistical rates whenever the second moment of the stochastic gradient noise is finite. Note that the fluctuations (depending on \tfrac{1}{\delta}) are of lower order than the term \Tr(\Sigma) .This improves upon the current best rate of \sqrt{\frac{\Tr(\Sigma)\ln(\tfrac{1}{\delta})}{T}} for Clipped-SGD, known \emph{only} for smooth and strongly convex objectives. Our results also extend to smooth convex and lipschitz convex objectives. Key to our result is a novel iterative refinement strategy for martingale concentration, improving upon the PAC-Bayes approach of \citet{catoni2018dimension}.

artificial intelligence, machine learning, near-optimal streaming heavy-tailed statistical estimation, (8 more...)

Neural Information Processing Systems

May-26-2025, 16:28:13 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.67)