FedMAX: Mitigating Activation Divergence for Accurate and Communication-Efficient Federated Learning
Chen, Wei, Bhardwaj, Kartikeya, Marculescu, Radu
In this paper, we identify a new phenomenon called activation-divergence which occurs in Federated Learning (FL) due to data heterogeneity (i.e., data being non-IID) across multiple users. Specifically, we argue that the activation vectors in FL can diverge, even if subsets of users share a few common classes with data residing on different devices. To address the activation-divergence issue, we introduce a prior based on the principle of maximum entropy; this prior assumes minimal information about the per-device activation vectors and aims at making the activation vectors of same classes as similar as possible across multiple devices. Our results show that, for both IID and non-IID settings, our proposed approach results in better accuracy (due to the significantly more similar activation vectors across multiple devices), and is more communication-efficient than state-of-the-art approaches in FL. Finally, we illustrate the effectiveness of our approach on a few common benchmarks and two large medical datasets.
Apr-7-2020
- Country:
- North America
- Canada > Ontario
- Toronto (0.04)
- United States
- California > Santa Clara County
- San Jose (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.14)
- Texas > Travis County
- Austin (0.14)
- Virginia (0.04)
- California > Santa Clara County
- Canada > Ontario
- North America
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Health & Medicine > Diagnostic Medicine (0.46)
- Technology: