TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

Mar-17-2026, 15:43:02 GMT–Neural Information Processing Systems

High network communication cost for synchronizing gradients and parameters is the well-known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary gradients to accelerate distributed deep learning in data parallelism. Our approach requires only three numerical levels {-1,0,1}, which can aggressively reduce the communication time. We mathematically prove the convergence of TernGrad under the assumption of a bound on gradients. Guided by the bound, we propose layer-wise ternarizing and gradient clipping to improve its convergence.

artificial intelligence, machine learning, terngrad, (7 more...)

Neural Information Processing Systems

Mar-17-2026, 15:43:02 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)