SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization

Singh, Navjot, Data, Deepesh, George, Jemin, Diggavi, Suhas

Jul-19-2020–arXiv.org Machine Learning

In this paper, we study communication-efficient decentralized training of large-scale machine learning models over a network. We propose and analyze SQuARM-SGD, a decentralized training algorithm, employing momentum and compressed communication between nodes regulated by a locally computable triggering rule. In SQuARM-SGD, each node performs a fixed number of local SGD (stochastic gradient descent) steps using Nesterov's momentum and then sends sparisified and quantized updates to its neighbors only when there is a significant change in its model parameters since the last time communication occurred. We provide convergence guarantees of our algorithm for strongly-convex and non-convex smooth objectives. We believe that ours is the first theoretical analysis for compressed decentralized SGD with momentum updates. We show that SQuARM-SGD converges at rate $\mathcal{O}\left(\frac{1}{nT}\right)$ for strongly-convex objectives, while for non-convex objectives it converges at rate $\mathcal{O}\left(\frac{1}{\sqrt{nT}}\right)$, thus matching the convergence rate of \emph{vanilla} distributed SGD in both these settings. We corroborate our theoretical understanding with experiments and compare the performance of our algorithm with the state-of-the-art, showing that without sacrificing much on the accuracy, SQuARM-SGD converges at a similar rate while saving significantly in total communicated bits.

artificial intelligence, machine learning, null 2, (17 more...)

arXiv.org Machine Learning

Jul-19-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States (1.00)

Genre:
- Research Report (0.49)

Industry:
- Government (0.67)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found