qsparse-local-sgd
- Asia > Middle East > Jordan (0.04)
- North America > Canada (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > Canada (0.04)
- Africa > Ghana > Greater Accra > Accra (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > Canada (0.04)
- Africa > Ghana > Greater Accra > Accra (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Africa > Ghana > Greater Accra > Accra (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Africa > Ghana > Greater Accra > Accra (0.04)
CSER: Communication-efficient SGD with Error Reset
Xie, Cong, Zheng, Shuai, Koyejo, Oluwasanmi, Gupta, Indranil, Li, Mu, Lin, Haibin
The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks. We propose a novel SGD variant: Communication-efficient SGD with Error Reset, or CSER. The key idea in CSER is first a new technique called "error reset" that adapts arbitrary compressors for SGD, producing bifurcated local models with periodic reset of resulting local residual errors. Second we introduce partial synchronization for both the gradients and the models, leveraging advantages from them. We prove the convergence of CSER for smooth non-convex problems. Empirical results show that when combined with highly aggressive compressors, the CSER algorithms: i) cause no loss of accuracy, and ii) accelerate the training by nearly $10\times$ for CIFAR-100, and by $4.5\times$ for ImageNet.
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations
Basu, Debraj, Data, Deepesh, Karakus, Can, Diggavi, Suhas
Communication bottleneck has been identified as a significant issue in distributed optimization of large-scale learning models. Recently, several approaches to mitigate this problem have been proposed, including different forms of gradient compression or computing local models and mixing them iteratively. In this paper we propose \emph{Qsparse-local-SGD} algorithm, which combines aggressive sparsification with quantization and local computation along with error compensation, by keeping track of the difference between the true and compressed gradients. We propose both synchronous and asynchronous implementations of \emph{Qsparse-local-SGD}. We analyze convergence for \emph{Qsparse-local-SGD} in the \emph{distributed} setting for smooth non-convex and convex objective functions. We demonstrate that \emph{Qsparse-local-SGD} converges at the same rate as vanilla distributed SGD for many important classes of sparsifiers and quantizers. We use \emph{Qsparse-local-SGD} to train ResNet-50 on ImageNet, and show that it results in significant savings over the state-of-the-art, in the number of bits transmitted to reach target accuracy.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom (0.04)
- (9 more...)