local update
FedAvg with Fine Tuning: Local Updates Lead to Representation Learning
The Federated Averaging (FedAvg) algorithm, which consists of alternating between a few local stochastic gradient updates at client nodes, followed by a model averaging update at the server, is perhaps the most commonly used method in Federated Learning. Notwithstanding its simplicity, several empirical studies have illustrated that the model output by FedAvg leads to a model that generalizes well to new unseen tasks after a few fine-tuning steps. This surprising performance of such a simple method, however, is not fully understood from a theoretical point of view. In this paper, we formally investigate this phenomenon in the multi-task linear regression setting. We show that the reason behind the generalizability of the FedAvg output is FedAvg's power in learning the common data representation among the clients' tasks, by leveraging the diversity among client data distributions via multiple local updates between communication rounds. We formally establish the iteration complexity required by the clients for proving such result in the setting where the underlying shared representation is a linear map. To the best of our knowledge, this is the first result showing that FedAvg learns an expressive representation in any setting. Moreover, we show that multiple local updates between communication rounds are necessary for representation learning, as distributed gradient methods that make only one local update between rounds provably cannot recover the ground-truth representation in the linear setting, and empirically yield neural network representations that generalize drastically worse to new clients than those learned by FedAvg trained on heterogeneous image classification datasets.
FedAvgwithFineTuning: LocalUpdatesLeadto RepresentationLearning
Federated Learning (FL) [1]provides acommunication-efficient andprivacypreserving means to learn from data distributed across clients such as cell phones, autonomous vehicles, and hospitals. FL aims for each client to benefit from collaborating in the learning process without sacrificing data privacy or paying a substantial communication cost. Federated Averaging (FedAvg) [1] is the predominant FL algorithm.
- Asia > China > Hong Kong (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- (2 more...)
- North America > United States > Virginia (0.04)
- North America > United States > Maryland (0.04)
- North America > United States > Virginia (0.05)
- Asia > China > Hubei Province > Wuhan (0.04)
- Europe > Austria (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (10 more...)
Heterogeneity-Guided Client Sampling: Towards Fast and Efficient Non-IID Federated Learning
This has motivated numerous studies aiming to reduce the variance and improve convergence of FL on non-IID data [6, 9, 14, 17, 19, 30]. On another note, constraints on communication resources and therefore on the number of clients that may participate in training additionally complicate implementation of FL schemes.
- North America > United States > Virginia (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Asia > Singapore (0.04)
- Information Technology > Security & Privacy (0.93)
- Government (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)