Simple and optimal high-probability bounds for strongly-convex stochastic gradient descent

Harvey, Nicholas J. A., Liaw, Christopher, Randhawa, Sikander

Sep-2-2019–arXiv.org Machine Learning

We consider stochastic gradient descent algorithms for minimizing a non-smooth, strongly-convex function. Several forms of this algorithm, including suffix averaging, are known to achieve the optimal $O(1/T)$ convergence rate in expectation. We consider a simple, non-uniform averaging strategy of Lacoste-Julien et al. (2011) and prove that it achieves the optimal $O(1/T)$ convergence rate with high probability. Our proof uses a recently developed generalization of Freedman's inequality. Finally, we compare several of these algorithms experimentally and show that this non-uniform averaging strategy outperforms many standard techniques, and with smaller variance.

artificial intelligence, machine learning, null, (18 more...)

arXiv.org Machine Learning

Sep-2-2019

arXiv.org PDF

Add feedback

Country:
- North America > Canada > British Columbia (0.14)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found