Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

Jan-18-2025, 03:40:47 GMT–Neural Information Processing Systems

Recent studies have provided both empirical and theoretical evidence illustrating that heavy tails can emerge in stochastic gradient descent (SGD) in various scenarios. Such heavy tails potentially result in iterates with diverging variance, which hinders the use of conventional convergence analysis techniques that rely on the existence of the second-order moments. In this paper, we provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance, for a class of strongly convex objectives. In the case where the p -th moment of the noise exists for some p\in [1,2), we first identify a condition on the Hessian, coined p -positive (semi-)definiteness', that leads to an interesting interpolation between the positive semi-definite cone ( p 2) and the cone of diagonally dominant matrices with non-negative diagonal entries ( p 1). Under this condition, we provide a convergence rate for the distance to the global optimum in L p .

convergence rate, infinite noise variance, stochastic gradient descent, (4 more...)

Neural Information Processing Systems

Jan-18-2025, 03:40:47 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.41)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)