A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks

Dec-24-2025, 22:34:13 GMT–Neural Information Processing Systems

In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically.

communication-efficient, gradient clipping algorithm, training deep neural network, (6 more...)

Neural Information Processing Systems

Dec-24-2025, 22:34:13 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (0.59)
  - Neural Networks > Deep Learning (0.51)