TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

Wen, Wei, Xu, Cong, Yan, Feng, Wu, Chunpeng, Wang, Yandan, Chen, Yiran, Li, Hai

Feb-14-2020, 08:12:58 GMT–Neural Information Processing Systems

High network communication cost for synchronizing gradients and parameters is the well-known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary gradients to accelerate distributed deep learning in data parallelism. Our approach requires only three numerical levels {-1,0,1}, which can aggressively reduce the communication time. We mathematically prove the convergence of TernGrad under the assumption of a bound on gradients. Guided by the bound, we propose layer-wise ternarizing and gradient clipping to improve its convergence.

reduce communication, ternary gradient, terngrad, (3 more...)

Neural Information Processing Systems

Feb-14-2020, 08:12:58 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.82)