ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training Chia-Y u Chen

Neural Information Processing Systems 

Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained. To overcome this limitation, numerous gradient compression techniques have been proposed and have demonstrated high compression ratios. However, most existing methods do not scale well to large scale distributed systems (due to gradient build-up) and/or fail to evaluate model fidelity (test accuracy) on large datasets.