Appendix A Related works
–Neural Information Processing Systems
In this section, we review closely related literature on decentralized optimization, communicationefficient algorithms, and communication compression. Decentralized optimization Decentralized optimization, which is a special class of linearly constrained (consensus constraint) optimization problems, has been studied for a long time [3, 9]. Many centralized algorithms can be intuitively converted into decentralized counterparts by using gossip averaging [14, 53], which mixes parameters from neighboring clients to enforce consensus. However, direct applications of gossip averaging often lead to either slow convergence or high error floors [34], and many fixes have been proposed in response [44, 56, 38, 7, 35]. Among them, gradient tracking [38, 7, 35], which applies the idea of dynamic average consensus [59] to global gradient estimation, provides a systematic approach to reduce the variance and has been successfully applied to decentralize many algorithms with faster rates of convergence. Communication-efficient algorithms While decentralized optimization is a classical topic, the focus on communication efficiency is relatively new due to the advances in large-scale machine learning.
Neural Information Processing Systems
Feb-10-2025, 11:24:09 GMT
- Technology: