Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation

Sheshukova, Marina, Belomestny, Denis, Durmus, Alain, Moulines, Eric, Naumov, Alexey, Samsonov, Sergey

Oct-7-2024–arXiv.org Machine Learning

We address the problem of solving strongly convex and smooth minimization problems using stochastic gradient descent (SGD) algorithm with a constant step size. Previous works suggested to combine the Polyak-Ruppert averaging procedure with the Richardson-Romberg extrapolation technique to reduce the asymptotic bias of SGD at the expense of a mild increase of the variance. We significantly extend previous results by providing an expansion of the mean-squared error of the resulting estimator with respect to the number of iterations $n$. More precisely, we show that the mean-squared error can be decomposed into the sum of two terms: a leading one of order $\mathcal{O}(n^{-1/2})$ with explicit dependence on a minimax-optimal asymptotic covariance matrix, and a second-order term of order $\mathcal{O}(n^{-3/4})$ where the power $3/4$ can not be improved in general. We also extend this result to the $p$-th moment bound keeping optimal scaling of the remainders with respect to $n$. Our analysis relies on the properties of the SGD iterates viewed as a time-homogeneous Markov chain. In particular, we establish that this chain is geometrically ergodic with respect to a suitably defined weighted Wasserstein semimetric.

artificial intelligence, inequality, machine learning, (15 more...)

arXiv.org Machine Learning

Oct-7-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East (0.14)
- Europe (0.28)
- North America > United States (0.14)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning
    - Gradient Descent (1.00)
  - Representation & Reasoning (1.00)