A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks
–Neural Information Processing Systems
In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically.
Neural Information Processing Systems
Aug-17-2025, 11:14:27 GMT
- Country:
- Europe > United Kingdom
- England > West Midlands > Birmingham (0.04)
- North America > United States
- California > Los Angeles County
- Long Beach (0.04)
- Texas > Brazos County
- College Station (0.04)
- California > Los Angeles County
- Europe > United Kingdom
- Genre:
- Research Report (0.93)
- Technology: