CSER: Communication-efficient SGD with Error Reset

Neural Information Processing Systems 

The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks.