Communication-efficient SGD: From Local SGD to One-Shot Averaging

Neural Information Processing Systems 

This method requires each worker to share their computed gradients with each other at every iteration. We will refer to this method as "synchronized parallel SGD."

Similar Docs  Excel Report  more

TitleSimilaritySource
None found