BEER: Fast O(1/T) Rate for Decentralized Nonconvex Optimization with Communication Compression

Jan-18-2025, 22:10:35 GMT–Neural Information Processing Systems

Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many efforts to design communication-compressed algorithms for decentralized nonconvex optimization, where the clients are only allowed to communicate a small amount of quantized information (aka bits) with their neighbors over a predefined graph topology. Despite significant efforts, the state-of-the-art algorithm in the nonconvex setting still suffers from a slower rate of convergence O((G/T) {2/3}) compared with their uncompressed counterpart, where G measures the data heterogeneity across different clients, and T is the number of communication rounds. This paper proposes BEER, which adopts communication compression with gradient tracking, and shows it converges at a faster rate of O(1/T) . This significantly improves over the state-of-the-art rate, by matching the rate without compression even under arbitrary data heterogeneity.

communication compression, data heterogeneity, decentralized nonconvex optimization, (3 more...)

Neural Information Processing Systems

Jan-18-2025, 22:10:35 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.43)