Parallel training of linear models without compromising convergence

Ioannou, Nikolas, Dünner, Celestine, Kourtis, Kornilios, Parnell, Thomas

Nov-5-2018–arXiv.org Machine Learning

In this paper we analyze, evaluate, and improve the performance of training generalized linear models on modern CPUs. We start with a state-of-the-art asynchronous parallel training algorithm, identify system-level performance bottlenecks, and apply optimizations that improve data parallelism, cache line locality, and cache line prefetching of the algorithm. These modifications reduce the per-epoch run-time significantly, but take a toll on algorithm convergence in terms of the required number of epochs. To alleviate these shortcomings of our systems-optimized version, we propose a novel, dynamic data partitioning scheme across threads which allows us to approach the convergence of the sequential version. The combined set of optimizations result in a consistent bottom line speedup in convergence of up to $\times12$ compared to the initial asynchronous parallel training algorithm and up to $\times42$, compared to state of the art implementations (scikit-learn and h2o) on a range of multi-core CPU architectures.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Machine Learning

Nov-5-2018

arXiv.org PDF

Add feedback

Country:
- Europe
  - France > Hauts-de-France
    - Nord > Lille (0.04)
  - Switzerland > Zürich
    - Zürich (0.14)
- North America > United States
  - Virginia (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found