Zhong, Xiaowei
Make Workers Work Harder: Decoupled Asynchronous Proximal Stochastic Gradient Descent
Li, Yitan, Xu, Linli, Zhong, Xiaowei, Ling, Qing
Asynchronous parallel optimization algorithms for solving large-scale machine learning problems have drawn significant attention from academia to industry recently. This paper proposes a novel algorithm, decoupled asynchronous proximal stochastic gradient descent (DAP-SGD), to minimize an objective function that is the composite of the average of multiple empirical losses and a regularization term. Unlike the traditional asynchronous proximal stochastic gradient descent (TAP-SGD) in which the master carries much of the computation load, the proposed algorithm off-loads the majority of computation tasks from the master to workers, and leaves the master to conduct simple addition operations. This strategy yields an easy-to-parallelize algorithm, whose performance is justified by theoretical convergence analyses. To be specific, DAP-SGD achieves an $O(\log T/T)$ rate when the step-size is diminishing and an ergodic $O(1/\sqrt{T})$ rate when the step-size is constant, where $T$ is the number of total iterations.
A Nonconvex Relaxation Approach for Rank Minimization Problems
Zhong, Xiaowei (University of Science and Technology of China) | Xu, Linli (University of Science and Technology of China) | Li, Yitan (University of Science and Technology of China) | Liu, Zhiyuan (University of Science and Technology of China) | Chen, Enhong (University of Science and Technology of China)
Recently, solving rank minimization problems by leveraging nonconvex relaxations has received significant attention. Some theoretical analyses demonstrate that it can provide a better approximation of original problems than convex relaxations. However, designing an effective algorithm to solve nonconvex optimization problems remains a big challenge. In this paper, we propose an Iterative Shrinkage-Thresholding and Reweighted Algorithm (ISTRA) to solve rank minimization problems using the nonconvex weighted nuclear norm as a low rank regularizer. We prove theoretically that under certain assumptions our method achieves a high-quality local optimal solution efficiently. Experimental results on synthetic and real data show that the proposed ISTRA algorithm outperforms state-of-the-art methods in both accuracy and efficiency.