bidirectional compression
Lower Bounds and Nearly Optimal Algorithms in Distributed Learning with Communication Compression Xinmeng Huang 1 Yiming Chen 2,3 Wotao Yin
Recent advances in distributed optimization and learning have shown that communication compression is one of the most effective means of reducing communication. While there have been many results for convergence rates with compressed communication, a lower bound is still missing. Analyses of algorithms with communication compression have identified two abstract properties that guarantee convergence: the unbiased property or the contrac-tive property.
- North America > United States > Pennsylvania (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
Preserved central model for faster bidirectional compression in distributed settings
We develop a new approach to tackle communication constraints in a distributed learning problem with a central server. We propose and analyze a new algorithm that performs bidirectional compression and achieves the same convergence rate as algorithms using only uplink (from the local workers to the central server) compression. To obtain this improvement, we design MCM, an algorithm such that the downlink compression only impacts local models, while the global model is preserved. As a result, and contrary to previous works, the gradients on local servers are computed on perturbed models. Consequently, convergence proofs are more challenging and require a precise control of this perturbation. To ensure it, MCM additionally combines model compression with a memory mechanism. This analysis opens new doors, e.g.
- Europe > France (0.04)
- North America > United States > Virginia (0.04)
- Asia > Middle East > Jordan (0.04)
Lower Bounds and Nearly Optimal Algorithms in Distributed Learning with Communication Compression Xinmeng Huang 1 Yiming Chen 2,3 Wotao Yin
Recent advances in distributed optimization and learning have shown that communication compression is one of the most effective means of reducing communication. While there have been many results for convergence rates with compressed communication, a lower bound is still missing. Analyses of algorithms with communication compression have identified two abstract properties that guarantee convergence: the unbiased property or the contrac-tive property.
- North America > United States > Pennsylvania (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
Preserved central model for faster bidirectional compression in distributed settings
We develop a new approach to tackle communication constraints in a distributed learning problem with a central server. We propose and analyze a new algorithm that performs bidirectional compression and achieves the same convergence rate as algorithms using only uplink (from the local workers to the central server) compression. To obtain this improvement, we design MCM, an algorithm such that the downlink compression only impacts local models, while the global model is preserved. As a result, and contrary to previous works, the gradients on local servers are computed on perturbed models. Consequently, convergence proofs are more challenging and require a precise control of this perturbation.
Lower Bounds and Nearly Optimal Algorithms in Distributed Learning with Communication Compression
Huang, Xinmeng, Chen, Yiming, Yin, Wotao, Yuan, Kun
Recent advances in distributed optimization and learning have shown that communication compression is one of the most effective means of reducing communication. While there have been many results on convergence rates under communication compression, a theoretical lower bound is still missing. Analyses of algorithms with communication compression have attributed convergence to two abstract properties: the unbiased property or the contractive property. They can be applied with either unidirectional compression (only messages from workers to server are compressed) or bidirectional compression. In this paper, we consider distributed stochastic algorithms for minimizing smooth and non-convex objective functions under communication compression. We establish a convergence lower bound for algorithms whether using unbiased or contractive compressors in unidirection or bidirection. To close the gap between the lower bound and the existing upper bounds, we further propose an algorithm, NEOLITHIC, which almost reaches our lower bound (up to logarithm factors) under mild conditions. Our results also show that using contractive bidirectional compression can yield iterative methods that converge as fast as those using unbiased unidirectional compression. The experimental results validate our findings.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Pennsylvania (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
Artemis: tight convergence guarantees for bidirectional compression in Federated Learning
Philippenko, Constantin, Dieuleveut, Aymeric
We introduce a new algorithm - Artemis - tackling the problem of learning in a distributed framework with communication constraints. Several workers (randomly sampled) perform the optimization process using a central server to aggregate their computation. To alleviate the communication cost, Artemis compresses the information sent in both directions (from the workers to the server and conversely) combined with a memory mechanism. It improves on existing quantized federated learning algorithms that only consider unidirectional compression (to the server), or use very strong assumptions on the compression operator, and often do not take into account devices partial participation. We provide fast rates of convergence (linear up to a threshold) under weak assumptions on the stochastic gradients (noise's variance bounded only at optimal point) in non-i.i.d. setting, highlight the impact of memory for unidirectional and bidirectional compression, analyze Polyak-Ruppert averaging. We use convergence in distribution to obtain a lower bound of the asymptotic variance that highlights practical limits of compression. And we provide experimental results to demonstrate the validity of our analysis.
- Europe > France (0.04)
- Europe > Middle East > Malta (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (9 more...)