Variance Reduced Stochastic Gradient Descent with Neighbors

Hofmann, Thomas, Lucchi, Aurelien, Lacoste-Julien, Simon, McWilliams, Brian

Feb-14-2020, 11:26:00 GMT–Neural Information Processing Systems

Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet it is also known to be slow relative to steepest descent. Recently, variance reduction techniques such as SVRG and SAGA have been proposed to overcome this weakness. With asymptotically vanishing variance, a constant step size can be maintained, resulting in geometric convergence rates. However, these methods are either based on occasional computations of full gradients at pivot points (SVRG), or on keeping per data point corrections in memory (SAGA). This has the disadvantage that one cannot employ these methods in a streaming setting and that speed-ups relative to SGD may need a certain number of epochs in order to materialize.

algorithm, stochastic gradient descent, variance reduced stochastic gradient descent, (2 more...)

Neural Information Processing Systems

Feb-14-2020, 11:26:00 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)