Scalable Classifiers with ADMM and Transpose Reduction
Taylor, Gavin (United States Naval Academy) | Xu, Zheng (University of Maryland) | Goldstein, Tom (University of Maryland)
As datasets for machine learning grow larger, parallelization strategies become more and more important. Recent approaches to distributed modelfitting rely heavily either on consensus ADMM, where each node solves smallsub-problems using only local data, or on stochastic gradient methods thatdon't scale well to large numbers of cores in a cluster setting. For this reason, GPU clusters have become common prerequisites to large-scale machinelearning. This paper describes an unconventional training method that uses alternating direction methods and Bregman iteration to train a variety of machine learning models on CPUs while avoiding the drawbacks of consensus methods and without gradient descent steps. Using transpose reduction strategies, the proposed method reduces the optimization problems to a sequence of minimization sub-steps that can each be solved globally in closed form. The method provides strong scaling in the distributed setting, yielding linear speedups even when split over thousands of cores.
Feb-4-2017
- Country:
- North America > United States > Maryland (0.28)
- Genre:
- Research Report (0.48)
- Industry:
- Government > Military (0.46)
- Technology: