Accelerated Doubly Stochastic Gradient Algorithm for Large-scale Empirical Risk Minimization

Shen, Zebang, Qian, Hui, Mu, Tongzhou, Zhang, Chao

Apr-23-2023–arXiv.org Artificial Intelligence

's andP(x) is a block-separable and convex regularization function. We allow both F(x) and P(x) to be non-smooth. In machine learning applications, many tasks can be naturally phrased as the above problem, e.g. the Empirical Risk Minimization (ERM) problem Zhang and Xiao [2015], Friedman et al. [2001]. However, the latest explosive growth of data introduces unprecedented computational challenges of scalability and storage bottleneck. In consequence, algorithms with fast convergence, small memory footprints, and low per-iteration complexity have been ardently pursued in the recent years. Most successful practices adopt stochastic strategies that incorporate randomness into solving procedures. They followed two parallel tracks. That is, in each iteration, gradient is estimated by either a mini batch of samples or a small block of variable coordinates, commonly designated by a random procedure. Accessing only a random mini batch of samples in each iteration, classical Stochastic Gradient Descent (SGD) and its Variance Reduction (VR) variants, such as SVRG Johnson and Zhang [2013], SAGA Defazio et al. [2014], and SAG Schmidt et al. [2013], have gained increasing attention in the last quinquennium.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

Apr-23-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found