Sparsified SGD with Memory

Sebastian U. Stich, Jean-Baptiste Cordonnier, Martin Jaggi

Neural Information Processing Systems 

Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i.e. algorithms that leverage the compute power of many