Gradient Sparsification for Communication-Efficient Distributed Optimization

Wangni, Jianqiao, Wang, Jialei, Liu, Ji, Zhang, Tong

arXiv.org Machine Learning 

Modern large scale machine learning applications require scaling stochastic optimization algorithms to distributed computational architectures. A key bottleneck is the communication overhead for exchanging information among different workers. For example, we have n training data distributed on M workers, and each of them owns its local copy of the model parameter vector. In the synchronized stochastic gradient method, each worker processes a random minibatch of its training data, and then the local updates are synchronized by making an All-Reduce step, which aggregates stochastic gradients from all workers, and taking a Broadcast step that transmits the updated parameter vector back to all workers. The process is repeated until an appropriate convergence criterion is met. An important factor that may significantly slow down any optimization algorithm is the communication cost among workers.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found