Reviews: Communication-efficient Distributed SGD with Sketching

Neural Information Processing Systems 

The quality is adequate; the authors show familiarity with, and build on ideas from, the relevant literature. The experimental setup (image classification and NMT) is also relevant. The work is very clear and well written. The proposed method could provide a significant reduction in training time for practitioners and researchers, but, in my opinion, needs some additional empirical validation. The bounded second moment and variance assumptions, together, are quite strong.