A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks

Neural Information Processing Systems 

In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found