Reviews: A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication

Neural Information Processing Systems 

Especially the ones related to experiments, and parameter tuning. Since the authors appropriately responded to some of my key criticisms I will upgrade my score from 5 to 6. original review This paper analyzes the convergence rate of distributed mini-batch SGD using sparse and quantized communication with application to deep learning. Based on the analysis, it proposes combining sparse and quantized communication to further reduce the communication cost that burdens the wall clock runtime in distributed setups. The paper is generally well written. The convergence analysis does appear to make sense, and the proposed combination of sparsification and quantization seems to save runtime a little bit with proper parameter tuning.