QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding

Alistarh, Dan, Grubic, Demjan, Li, Jerry, Tomioka, Ryota, Vojnovic, Milan

Feb-14-2020, 08:41:46 GMT–Neural Information Processing Systems

Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to its excellent scalability properties. A fundamental barrier when parallelizing SGD is the high bandwidth cost of communicating gradient updates between nodes; consequently, several lossy compresion heuristics have been proposed, by which nodes only communicate quantized gradients. Although effective in practice, these heuristics do not always guarantee convergence, and it is not clear whether they can be improved. In this paper, we propose Quantized SGD (QSGD), a family of compression schemes for gradient updates which provides convergence guarantees. QSGD allows the user to smoothly trade off \emph{communication bandwidth} and \emph{convergence time}: nodes can adjust the number of bits sent per iteration, at the cost of possibly higher variance.

communication-efficient sgd, gradient quantization and encoding, qsgd, (4 more...)

Neural Information Processing Systems

Feb-14-2020, 08:41:46 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)