Make Workers Work Harder: Decoupled Asynchronous Proximal Stochastic Gradient Descent

Li, Yitan, Xu, Linli, Zhong, Xiaowei, Ling, Qing

May-21-2016–arXiv.org Machine Learning

With the enormous growth of data size n and model complexity, asynchronous parallel algorithms [1, 2, 3, 4, 5, 6] have become an important tool and received significant successes for solving large scale machine learning problems in the form of (1). Asynchronous parallel algorithms distribute computation on multicore systems (shared memory architecture) or multi-machine system (parameter server architecture), whose computation power generally scales up with the increasing number of cores or machines. As a consequence, effective design and implementation of asynchronous parallel algorithms is critical for large scale machine learning. Numerous efforts have been devoted to this topic. Among them, asynchronous stochastic gradient descent is proposed in [1, 2], and its performance is guaranteed by theoretical convergence analyses. An asynchronous proximal gradient descent algorithm is designed on the parameter server architecture in [3] with a distributed optimization software provided. Convergence rate of asynchronous stochastic gradient descent with a nonconvex objective is analyzed in [4].

artificial intelligence, machine learning, proximal operator, (12 more...)

arXiv.org Machine Learning

May-21-2016

arXiv.org PDF

Add feedback

Country:
- Europe (0.46)
- North America > Canada
  - Quebec (0.14)

Genre:
- Research Report (0.82)

Industry:
- Education (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found