Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback

Jan-25-2025, 03:51:50 GMT–Neural Information Processing Systems

Communication overhead is a major bottleneck hampering the scalability of distributed machine learning systems. Recently, there has been a surge of interest in using gradient compression to improve the communication efficiency of distributed neural network training. Using 1-bit quantization, signSGD with majority vote achieves a 32x reduction on communication cost. However, its convergence is based on unrealistic assumptions and can diverge in practice. In this paper, we propose a general distributed compressed SGD with Nesterov's momentum.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Jan-25-2025, 03:51:50 GMT

Conferences PDF

Add feedback

Country:
- North America > Canada (0.14)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)