A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication

Feb-14-2020, 10:57:01 GMT–Neural Information Processing Systems

The large communication overhead has imposed a bottleneck on the performance of distributed Stochastic Gradient Descent (SGD) for training deep neural networks. Previous works have demonstrated the potential of using gradient sparsification and quantization to reduce the communication cost. However, there is still a lack of understanding about how sparse and quantized communication affects the convergence rate of the training algorithm. In this paper, we study the convergence rate of distributed SGD for non-convex optimization with two communication reducing strategies: sparse parameter averaging and gradient quantization. We show that $O(1/\sqrt{MK})$ convergence rate can be achieved if the sparsification and quantization hyperparameters are configured properly.

convergence rate, linear speedup analysis, sparse and quantized communication, (5 more...)

Neural Information Processing Systems

Feb-14-2020, 10:57:01 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.75)
  - Statistical Learning > Gradient Descent (0.64)