Variance Reduced ProxSkip: Algorithm, Theory and Application to Federated Learning

Oct-11-2024, 07:31:33 GMT–Neural Information Processing Systems

We study distributed optimization methods based on the {\em local training (LT)} paradigm, i.e., methods which achieve communication efficiency by performing richer local gradient-based training on the clients before (expensive) parameter averaging is allowed to take place. While these methods were first proposed about a decade ago, and form the algorithmic backbone of federated learning, there is an enormous gap between their practical performance, and our theoretical understanding. Looking back at the progress of the field, we {\em identify 5 generations of LT methods}: 1) heuristic, 2) homogeneous, 3) sublinear, 4) linear, and 5) accelerated. The 5 {} {\rm th} generation was initiated by the ProxSkip method of Mishchenko et al. (2022), whose analysis provided the first theoretical confirmation that LT is a communication acceleration mechanism. Inspired by this recent progress, we contribute to the 5 {} {\rm th} generation of LT methods by showing that it is possible to enhance ProxSkip further using {\em variance reduction}.

federated learning, theory and application, variance reduced proxskip, (5 more...)

Neural Information Processing Systems

Oct-11-2024, 07:31:33 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.93)