Convergence Rate for the Last Iterate of Stochastic Gradient Descent Schemes

Oct-13-2025–arXiv.org Artificial Intelligence

We study the convergence rate for the last iterate of stochastic gradient descent (SGD) and stochastic heavy ball (SHB) in the parametric setting when the objective function $F$ is globally convex or non-convex whose gradient is $γ$-Hölder. Using only discrete Gronwall's inequality without Robbins-Siegmund theorem, we recover results for both SGD and SHB: $\min_{s\leq t} \|\nabla F(w_s)\|^2 = o(t^{p-1})$ for non-convex objectives and $F(w_{τ\wedge t}) - F_* = o(t^{2γ/(1+γ) \cdot \max(p-1,-2p+1)-\eps})$ for $β\in (0, 1)$, $τ:= \inf \{ t > 0 : F(w_t) = F_*\}$, and $\min_{s \leq t} F(w_s) - F_* = o(t^{p-1})$ for convex objectives $F$ whose minimum is $F_*$. In addition, we proved that SHB with constant momentum parameter $β\in (0, 1)$ attains a convergence rate of $F(w_t) - F_* = O(t^{\max(p-1,-2p+1)} \log^2 \frac{t}δ)$ with probability at least $1-δ$ when $F$ is convex and $γ= 1$ and step size $α_t = Θ(t^{-p})$ with $p \in (\frac{1}{2}, 1)$.

artificial intelligence, assumption 2, machine learning, (14 more...)

arXiv.org Artificial Intelligence

Oct-13-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found