Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations

Oct-11-2024, 05:55:04 GMT–Neural Information Processing Systems

This work proposes a novel analysis of stochastic gradient descent (SGD) for non-convex and smooth optimization. In the case of sub-Gaussian and centered noise, we prove that, with probability 1-\delta, the number of iterations to reach a precision \varepsilon for the squared gradient norm is O(\varepsilon {-2}\ln(1/\delta)) . In the case of centered and integrable heavy-tailed noise, we show that, while the expectation of the iterates may be infinite, the squared gradient norm still converges with probability 1-\delta in O(\varepsilon {-p}\delta {-q}) iterations, where p,q 2 . This result shows that heavy-tailed noise on the gradient slows down the convergence of SGD without preventing it, proving that SGD is robust to gradient noise with unbounded variance, a setting of interest for Deep Learning. In addition, it indicates that choosing a step size proportional to T {-1/b} where b is the tail-parameter of the noise and T is the number of iterations leads to the best convergence rates.

noise, non-convex stochastic gradient descent, robustness analysis, (8 more...)

Neural Information Processing Systems

Oct-11-2024, 05:55:04 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.41)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)