Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices

Neural Information Processing Systems 

Training deep neural networks on large datasets can often be accelerated by using multiple compute nodes.