Ordered Momentum for Asynchronous SGD Chang-Wei Shi Yi-Rui Y ang Wu-Jun Li

Neural Information Processing Systems 

Distributed learning is essential for training large-scale deep models.