Make Workers Work Harder: Decoupled Asynchronous Proximal Stochastic Gradient Descent
Li, Yitan, Xu, Linli, Zhong, Xiaowei, Ling, Qing
With the enormous growth of data size n and model complexity, asynchronous parallel algorithms [1, 2, 3, 4, 5, 6] have become an important tool and received significant successes for solving large scale machine learning problems in the form of (1). Asynchronous parallel algorithms distribute computation on multicore systems (shared memory architecture) or multi-machine system (parameter server architecture), whose computation power generally scales up with the increasing number of cores or machines. As a consequence, effective design and implementation of asynchronous parallel algorithms is critical for large scale machine learning. Numerous efforts have been devoted to this topic. Among them, asynchronous stochastic gradient descent is proposed in [1, 2], and its performance is guaranteed by theoretical convergence analyses. An asynchronous proximal gradient descent algorithm is designed on the parameter server architecture in [3] with a distributed optimization software provided. Convergence rate of asynchronous stochastic gradient descent with a nonconvex objective is analyzed in [4].
May-21-2016
- Country:
- Europe (0.46)
- North America > Canada
- Quebec (0.14)
- Genre:
- Research Report (0.82)
- Industry:
- Education (0.34)
- Technology: