Goto

Collaborating Authors

 cinic-10


No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data

Neural Information Processing Systems

A central challenge in training classification models in the real-world federated system is learning with non-IID data. To cope with this, most of the existing works involve enforcing regularization in local optimization or improving the model aggregation scheme at the server. Other works also share public datasets or synthesized samples to supplement the training of under-represented classes or introduce a certain level of personalization. Though effective, they lack a deep understanding of how the data heterogeneity affects each layer of a deep classification model. In this paper, we bridge this gap by performing an experimental analysis of the representations learned by different layers. Our observations are surprising: (1) there exists a greater bias in the classifier than other layers, and (2) the classification performance can be significantly improved by post-calibrating the classifier after federated training. Motivated by the above findings, we propose a novel and simple algorithm called Classifier Calibration with Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated gaussian mixture model. Experimental results demonstrate that CCVR achieves state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10. We hope that our simple yet effective method can shed some light on the future research of federated learning with non-IID data.






A.1 ThePรณlya-Gammaaugmentation A random variableฯ‰ has a Pรณlya-Gamma distribution if it can be written as an infinite sum of independentgammarandomvariables: ฯ‰ D = 1 2ฯ€2 X

Neural Information Processing Systems

GivenatrainingdatasetD =(X,y)offeaturesandcorresponding labels from {1, ..., T} classes,D is partitioned recursively to two subsets, according to classes, at each tree level until reaching leaf nodes with data from only one class. More concretely, initially, feature vectors for all samples are obtained (using a NN), then a class prototype is generated by averaging the feature vectors belonging to the same class for all classes.




We thank the reviewers for taking the time to write these thorough reviews and their appreciation of BatchBALD as a

Neural Information Processing Systems

We address reviewer 1, 2 and 3 as R1, R2, R3. R1-(5): We use 25%, 75% quartiles for the shaded areas, see line 147 in the paper. R2 - Originality: Thank you for pointing us to additional relevant related work: we have added citations. We provide additional results on CINIC-10 (top figure, left). We use 50 MC dropout samples, acquisition size 10 and 6 trials.