Gradient Sparsification for Communication-Efficient Distributed Optimization

Wangni, Jianqiao, Wang, Jialei, Liu, Ji, Zhang, Tong

Oct-26-2017–arXiv.org Machine Learning

Modern large scale machine learning applications require scaling stochastic optimization algorithms to distributed computational architectures. A key bottleneck is the communication overhead for exchanging information among different workers. For example, we have n training data distributed on M workers, and each of them owns its local copy of the model parameter vector. In the synchronized stochastic gradient method, each worker processes a random minibatch of its training data, and then the local updates are synchronized by making an All-Reduce step, which aggregates stochastic gradients from all workers, and taking a Broadcast step that transmits the updated parameter vector back to all workers. The process is repeated until an appropriate convergence criterion is met. An important factor that may significantly slow down any optimization algorithm is the communication cost among workers.

artificial intelligence, machine learning, var, (17 more...)

arXiv.org Machine Learning

Oct-26-2017

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Statistical Learning
    - Gradient Descent (0.57)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found