A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks
–Neural Information Processing Systems
In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically.
Neural Information Processing Systems
Aug-17-2025, 11:14:27 GMT
- Country:
- North America > United States
- Texas > Brazos County
- College Station (0.04)
- California > Los Angeles County
- Long Beach (0.04)
- Texas > Brazos County
- Europe > United Kingdom
- England > West Midlands > Birmingham (0.04)
- North America > United States
- Genre:
- Research Report (0.93)
- Technology: