Direct Acceleration of SAGA using Sampled Negative Momentum

Jun-28-2018–arXiv.org Machine Learning

We focus on achieving very high accuracy for Problem (1), although for practical optimization tasks, such as supervised learning, low empirical risk may result in high generalization error. In this paper, we treat Problem (1) as a pure optimization problem. When F(·) in Problem (1) is strongly convex, traditional analysis shows that gradient descent (GD) yields a fast linear convergence rate but with a high per-iteration cost, and thus may not be suitable for problems with a very large n. As an alternative for large problems, SGD [Robbins and Monro, 1951] uses only one or a mini-batch of gradients in each iteration, and thus enjoys significantly lower per-iteration complexity than GD. However, due to the variance of gradient estimator, vanilla SGD is shown to yield only a sub-linear convergence rate. Recently, stochastic variance reduced methods (e.g., SAG [Roux et al., 2012], SVRG [Johnson and Zhang, 2013], SAGA [Defazio et al., 2014], and their proximal variants, such as [Schmidt et al., 2017], [Xiao and Zhang, 2014] and [Konečný et al., 2016]) were proposed to solve Problem (1). All these methods are equipped with various variance reduction techniques, which help them achieve low per-iteration complexities comparable with SGD and at the same time maintain the fast linear convergence rate of GD.

artificial intelligence, machine learning, saga, (16 more...)

arXiv.org Machine Learning

Jun-28-2018

arXiv.org PDF

Add feedback

Country:
- Asia
  - Middle East > Jordan (0.04)
  - China > Hong Kong (0.04)

Genre:
- Research Report (0.65)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found