Asynchronous Parallel Greedy Coordinate Descent

Neural Information Processing Systems

In this paper, we propose and study an Asynchronous parallel Greedy Coordinate Descent (Asy-GCD) algorithm for minimizing a smooth function with bounded constraints. At each iteration, workers asynchronously conduct greedy coordinate descent updates on a block of variables. In the first part of the paper, we analyze the theoretical behavior of Asy-GCD and prove a linear convergence rate. In the second part, we develop an efficient kernel SVM solver based on Asy-GCD in the shared memory multi-core setting. Since our algorithm is fully asynchronous--each core does not need to idle and wait for the other cores--the resulting algorithm enjoys good speedup and outperforms existing multi-core kernel SVM solvers including asynchronous stochastic coordinate descent and multi-core LIBSVM.


Feature Clustering for Accelerating Parallel Coordinate Descent

Neural Information Processing Systems

Large-scale 1-regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, includingclassification and regression problems. High-performance algorithms andimplementations are critical to efficiently solving these problems. Building upon previous work on coordinate descent algorithms for 1-regularized problems, we introduce a novel family of algorithms called block-greedy coordinate descentthat includes, as special cases, several existing algorithms such as SCD, Greedy CD, Shotgun, and Thread-Greedy. We give a unified convergence analysis for the family of block-greedy algorithms. The analysis suggests that block-greedy coordinate descent can better exploit parallelism if features are clustered sothat the maximum inner product between features in different blocks is small. Our theoretical convergence analysis is supported with experimental results usingdata from diverse real-world applications. We hope that algorithmic approaches and convergence analysis we provide will not only advance the field, but will also encourage researchers to systematically explore the design space of algorithms for solving large-scale 1-regularization problems.


Nearest Neighbor based Greedy Coordinate Descent

Neural Information Processing Systems

Increasingly, optimization problems in machine learning, especially those arising from high-dimensional statistical estimation, have a large number of variables. Modern statistical estimators developed over the past decade have statistical or sample complexity that depends only weakly on the number of parameters when there is some structure to the problem, such as sparsity. A central question is whether similar advances can be made in their computational complexity as well. In this paper, we propose strategies that indicate that such advances can indeed be made. In particular, we investigate the greedy coordinate descent algorithm, and note that performing the greedy step efficiently weakens the costly dependence on the problem size provided the solution is sparse. We then propose a suite of methods that perform these greedy steps efficiently by a reduction to nearest neighbor search. We also devise a more amenable form of greedy descent for composite non-smooth objectives; as well as several approximate variants of such greedy descent. We develop a practical implementation of our algorithm that combines greedy coordinate descent with locality sensitive hashing. Without tuning the latter data structure, we are not only able to significantly speed up the vanilla greedy method, but also outperform cyclic descent when the problem size becomes large. Our results indicate the effectiveness of our nearest neighbor strategies, and also point to many open questions regarding the development of computational geometric techniques tailored towards first-order optimization methods.


Asynchronous Parallel Greedy Coordinate Descent

Neural Information Processing Systems

At each iteration, workers asynchronously conduct greedy coordinate descent updates on a block of variables. In the first part of the paper, we analyze the theoretical behavior of Asy-GCD and prove a linear convergence rate. In the second part, we develop an efficient kernel SVM solver based on Asy-GCD in the shared memory multi-core setting. Since our algorithm is fully asynchronous---each core does not need to idle and wait for the other cores---the resulting algorithm enjoys good speedup and outperforms existing multi-core kernel SVM solvers including asynchronous stochastic coordinate descent and multi-core LIBSVM. Papers published at the Neural Information Processing Systems Conference.


Accelerated Stochastic Greedy Coordinate Descent by Soft Thresholding Projection onto Simplex

Neural Information Processing Systems

In this paper we study the well-known greedy coordinate descent (GCD) algorithm to solve $\ell_1$-regularized problems and improve GCD by the two popular strategies: Nesterov's acceleration and stochastic optimization. Firstly, we propose a new rule for greedy selection based on an $\ell_1$-norm square approximation which is nontrivial to solve but convex; then an efficient algorithm called ``SOft ThreshOlding PrOjection (SOTOPO)'' is proposed to exactly solve the $\ell_1$-regularized $\ell_1$-norm square approximation problem, which is induced by the new rule. Based on the new rule and the SOTOPO algorithm, the Nesterov's acceleration and stochastic optimization strategies are then successfully applied to the GCD algorithm. The resulted algorithm called accelerated stochastic greedy coordinate descent (ASGCD) has the optimal convergence rate $O(\sqrt{1/\epsilon})$; meanwhile, it reduces the iteration complexity of greedy selection up to a factor of sample size. Both theoretically and empirically, we show that ASGCD has better performance for high-dimensional and dense problems with sparse solution.