BML: A High-performance, Low-cost Gradient Synchronization Algorithm for DML Training

Wang, Songtao, Li, Dan, Cheng, Yang, Geng, Jinkun, Wang, Yanshu, Wang, Shuai, Xia, Shu-Tao, Wu, Jianping

Dec-31-2018–Neural Information Processing Systems

In distributed machine learning (DML), the network performance between machines significantly impacts the speed of iterative training. In this paper we propose BML, a new gradient synchronization algorithm with higher network performance and lower network cost than the current practice. BML runs on BCube network, instead of using the traditional Fat-Tree topology. BML algorithm is designed in such a way that, compared to the parameter server (PS) algorithm on a Fat-Tree network connecting the same number of server machines, BML achieves theoretically 1/k of the gradient synchronization time, with k/5 of switches (the typical number of k is 2∼4). Experiments of LeNet-5 and VGG-19 benchmarks on a testbed with 9 dual-GPU servers show that, BML reduces the job completion time of DML training by up to 56.4%.

artificial intelligence, machine learning, server, (16 more...)

Neural Information Processing Systems

Dec-31-2018

Conferences PDF

Add feedback

Country:
- Asia > China (0.14)
- North America > Canada (0.14)

Genre:
- Research Report (0.46)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning
    - Neural Networks (0.69)
  - Communications > Networks (1.00)

Duplicate Docs Excel Report

Title
BML: A High-performance, Low-cost Gradient Synchronization Algorithm for DML Training
BML: A High-performance, Low-cost Gradient Synchronization Algorithm for DML Training

Similar Docs Excel Report more

Title	Similarity	Source
None found