Convex SGD: Generalization Without Early Stopping

Jan-8-2024–arXiv.org Artificial Intelligence

We consider the generalization error associated with stochastic gradient descent on a smooth convex function over a compact set. We show the first bound on the generalization error that vanishes when the number of iterations $T$ and the dataset size $n$ go to zero at arbitrary rates; our bound scales as $\tilde{O}(1/\sqrt{T} + 1/\sqrt{n})$ with step-size $\alpha_t = 1/\sqrt{t}$. In particular, strong convexity is not needed for stochastic gradient descent to generalize well.

assumption, generalization error, gradient descent, (10 more...)

arXiv.org Artificial Intelligence

Jan-8-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Belgium > Wallonia
    - Walloon Brabant > Louvain-la-Neuve (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.96)