AITopics | terngrad

TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

Neural Information Processing SystemsMar-17-2026, 15:43:02 GMT

High network communication cost for synchronizing gradients and parameters is the well-known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary gradients to accelerate distributed deep learning in data parallelism. Our approach requires only three numerical levels {-1,0,1}, which can aggressively reduce the communication time. We mathematically prove the convergence of TernGrad under the assumption of a bound on gradients. Guided by the bound, we propose layer-wise ternarizing and gradient clipping to improve its convergence.

artificial intelligence, machine learning, terngrad, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

ATOMO: Communication-efficient Learning via Atomic Sparsification

Hongyi Wang, Scott Sievert, Shengchao Liu, Zachary Charles, Dimitris Papailiopoulos, Stephen Wright

Neural Information Processing SystemsFeb-12-2026, 14:22:13 GMT

Neural Information Processing Systems http://nips.cc/

arxiv preprint arxiv, decomposition, gradient, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Add feedback

TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

Neural Information Processing SystemsNov-21-2025, 15:47:51 GMT

High network communication cost for synchronizing gradients and parameters is the well-known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary gradients to accelerate distributed deep learning in data parallelism. Our approach requires only three numerical levels {-1,0,1}, which can aggressively reduce the communication time. We mathematically prove the convergence of TernGrad under the assumption of a bound on gradients. Guided by the bound, we propose layer-wise ternarizing and gradient clipping to improve its convergence.

reduce communication, ternary gradient, terngrad, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

ATOMO: Communication-efficient Learning via Atomic Sparsification

Hongyi Wang, Scott Sievert, Shengchao Liu, Zachary Charles, Dimitris Papailiopoulos, Stephen Wright

Neural Information Processing SystemsNov-20-2025, 15:36:48 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, deep learning, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Add feedback

Reviews: TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

Neural Information Processing SystemsOct-8-2024, 03:22:18 GMT

The authors tried to reduce communication cost in distributed deep learning. They proposed a new method called Ternary Gradients to reduce the overhead of gradient synchronization. The technique is correct and the authors made some experiments to evaluate their method. Overall, the method is novel and interesting and may have significant impact on large scale deep learning. Some comments below for the authors to improve their paper.

deep learning, reduce communication, ternary gradient, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, Hai Li

Neural Information Processing SystemsOct-3-2024, 21:02:36 GMT

High network communication cost for synchronizing gradients and parameters is the well-known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary gradients to accelerate distributed deep learning in data parallelism. Our approach requires only three numerical levels { 1, 0, 1}, which can aggressively reduce the communication time. We mathematically prove the convergence of TernGrad under the assumption of a bound on gradients. Guided by the bound, we propose layer-wise ternarizing and gradient clipping to improve its convergence. Our experiments show that applying TernGrad on AlexNet doesn't incur any accuracy loss and can even improve accuracy. The accuracy loss of GoogLeNet induced by TernGrad is less than 2% on average. Finally, a performance model is proposed to study the scalability of TernGrad. Experiments show significant speed gains for various deep neural networks.

gradient, mini-batch size, terngrad, (13 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Nevada > Washoe County > Reno (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

Wen, Wei, Xu, Cong, Yan, Feng, Wu, Chunpeng, Wang, Yandan, Chen, Yiran, Li, Hai

Neural Information Processing SystemsFeb-14-2020, 08:12:58 GMT

High network communication cost for synchronizing gradients and parameters is the well-known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary gradients to accelerate distributed deep learning in data parallelism. Our approach requires only three numerical levels {-1,0,1}, which can aggressively reduce the communication time. We mathematically prove the convergence of TernGrad under the assumption of a bound on gradients. Guided by the bound, we propose layer-wise ternarizing and gradient clipping to improve its convergence.

reduce communication, ternary gradient, terngrad, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.82)

Add feedback

Distributed Learning with Compressed Gradient Differences

Mishchenko, Konstantin, Gorbunov, Eduard, Takáč, Martin, Richtárik, Peter

arXiv.org Machine LearningJan-26-2019

Training very large machine learning models requires a distributed computing approach, with communication of the model updates often being the bottleneck. For this reason, several methods based on the compression (e.g., sparsification and/or quantization) of the updates were recently proposed, including QSGD (Alistarh et al., 2017), TernGrad (Wen et al., 2017), SignSGD (Bernstein et al., 2018), and DQGD (Khirirat et al., 2018). However, none of these methods are able to learn the gradients, which means that they necessarily suffer from several issues, such as the inability to converge to the true optimum in the batch mode, inability to work with a nonsmooth regularizer, and slow convergence rates. In this work we propose a new distributed learning method---DIANA---which resolves these issues via compression of gradient differences. We perform a theoretical analysis in the strongly convex and nonconvex settings and show that our rates are vastly superior to existing rates. Our analysis of block-quantization and differences between $\ell_2$ and $\ell_\infty$ quantization closes the gaps in theory and practice. Finally, by applying our analysis technique to TernGrad, we establish the first convergence rate for this method.

inequality, learning, mpi process, (15 more...)

arXiv.org Machine Learning

1901.09269

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

ATOMO: Communication-efficient Learning via Atomic Sparsification

Wang, Hongyi, Sievert, Scott, Liu, Shengchao, Charles, Zachary, Papailiopoulos, Dimitris, Wright, Stephen

Neural Information Processing SystemsDec-31-2018

Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that these are facets of a general sparsification method that can operate on any possible atomic decomposition. Notable examples include element-wise, singular value, and Fourier decompositions. We present ATOMO, a general framework for atomic sparsification of stochastic gradients. Given a gradient, an atomic decomposition, and a sparsity budget, ATOMO gives a random unbiased sparsification of the atoms minimizing variance. We show that recent methods such as QSGD and TernGrad are special cases of ATOMO, and that sparsifiying the singular value decomposition of neural networks gradients, rather than their coordinates, can lead to significantly faster distributed training.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America (0.46)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

ATOMO: Communication-efficient Learning via Atomic Sparsification

Wang, Hongyi, Sievert, Scott, Liu, Shengchao, Charles, Zachary, Papailiopoulos, Dimitris, Wright, Stephen

Neural Information Processing SystemsDec-31-2018

Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that these are facets of a general sparsification method that can operate on any possible atomic decomposition. Notable examples include element-wise, singular value, and Fourier decompositions. We present ATOMO, a general framework for atomic sparsification of stochastic gradients. Given a gradient, an atomic decomposition, and a sparsity budget, ATOMO gives a random unbiased sparsification of the atoms minimizing variance. We show that recent methods such as QSGD and TernGrad are special cases of ATOMO, and that sparsifiying the singular value decomposition of neural networks gradients, rather than their coordinates, can lead to significantly faster distributed training.

artificial intelligence, deep learning, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America (0.46)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Filters

Collaborating Authors

terngrad

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

ATOMO: Communication-efficient Learning via Atomic Sparsification

TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

ATOMO: Communication-efficient Learning via Atomic Sparsification

Reviews: TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

Distributed Learning with Compressed Gradient Differences

ATOMO: Communication-efficient Learning via Atomic Sparsification

ATOMO: Communication-efficient Learning via Atomic Sparsification