TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

Nov-21-2025, 15:47:51 GMT–Neural Information Processing Systems

High network communication cost for synchronizing gradients and parameters is the well-known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary gradients to accelerate distributed deep learning in data parallelism. Our approach requires only three numerical levels {-1,0,1}, which can aggressively reduce the communication time. We mathematically prove the convergence of TernGrad under the assumption of a bound on gradients. Guided by the bound, we propose layer-wise ternarizing and gradient clipping to improve its convergence.

reduce communication, ternary gradient, terngrad, (5 more...)

Neural Information Processing Systems

Nov-21-2025, 15:47:51 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)