Variance-Reduced Methods for Machine Learning

Gower, Robert M., Schmidt, Mark, Bach, Francis, Richtarik, Peter

Oct-2-2020–arXiv.org Machine Learning

Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago. The last 8 years have seen an exciting new development: variance reduction (VR) for stochastic optimization methods. These VR methods excel in settings where more than one pass through the training data is allowed, achieving a faster convergence than SGD in theory as well as practice. These speedups underline the surge of interest in VR methods and the fast-growing body of work on this topic. This review covers the key principles and main developments behind VR methods for optimization with finite data sets and is aimed at non-expert readers. We focus mainly on the convex setting, and leave pointers to readers interested in extensions for minimizing non-convex functions.

artificial intelligence, gradient, machine learning, (15 more...)

arXiv.org Machine Learning

Oct-2-2020

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)
- Europe (0.28)
- North America > Canada (0.28)

Genre:
- Research Report (1.00)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found