Convex SGD: Generalization Without Early Stopping
Hendrickx, Julien, Olshevsky, Alex
–arXiv.org Artificial Intelligence
We consider the generalization error associated with stochastic gradient descent on a smooth convex function over a compact set. We show the first bound on the generalization error that vanishes when the number of iterations $T$ and the dataset size $n$ go to zero at arbitrary rates; our bound scales as $\tilde{O}(1/\sqrt{T} + 1/\sqrt{n})$ with step-size $\alpha_t = 1/\sqrt{t}$. In particular, strong convexity is not needed for stochastic gradient descent to generalize well.
arXiv.org Artificial Intelligence
Jan-8-2024
- Country:
- North America > United States (0.04)
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Belgium > Wallonia
- Walloon Brabant > Louvain-la-Neuve (0.04)
- United Kingdom > England
- Asia > Middle East
- Jordan (0.04)
- Genre:
- Research Report (0.50)
- Technology: