SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization
Singh, Navjot, Data, Deepesh, George, Jemin, Diggavi, Suhas
In this paper, we study communication-efficient decentralized training of large-scale machine learning models over a network. We propose and analyze SQuARM-SGD, a decentralized training algorithm, employing momentum and compressed communication between nodes regulated by a locally computable triggering rule. In SQuARM-SGD, each node performs a fixed number of local SGD (stochastic gradient descent) steps using Nesterov's momentum and then sends sparisified and quantized updates to its neighbors only when there is a significant change in its model parameters since the last time communication occurred. We provide convergence guarantees of our algorithm for strongly-convex and non-convex smooth objectives. We believe that ours is the first theoretical analysis for compressed decentralized SGD with momentum updates. We show that SQuARM-SGD converges at rate $\mathcal{O}\left(\frac{1}{nT}\right)$ for strongly-convex objectives, while for non-convex objectives it converges at rate $\mathcal{O}\left(\frac{1}{\sqrt{nT}}\right)$, thus matching the convergence rate of \emph{vanilla} distributed SGD in both these settings. We corroborate our theoretical understanding with experiments and compare the performance of our algorithm with the state-of-the-art, showing that without sacrificing much on the accuracy, SQuARM-SGD converges at a similar rate while saving significantly in total communicated bits.
Jul-19-2020
- Country:
- North America > United States (1.00)
- Genre:
- Research Report (0.49)
- Industry:
- Government (0.67)
- Technology: