Goto

Collaborating Authors

 Statistical Learning




No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data

Neural Information Processing Systems

A central challenge in training classification models in the real-world federated system is learning with non-IID data. To cope with this, most of the existing works involve enforcing regularization in local optimization or improving the model aggregation scheme at the server. Other works also share public datasets or synthesized samples to supplement the training of under-represented classes or introduce a certain level of personalization. Though effective, they lack a deep understanding of how the data heterogeneity affects each layer of a deep classification model. In this paper, we bridge this gap by performing an experimental analysis of the representations learned by different layers. Our observations are surprising: (1) there exists a greater bias in the classifier than other layers, and (2) the classification performance can be significantly improved by post-calibrating the classifier after federated training. Motivated by the above findings, we propose a novel and simple algorithm called Classifier Calibration with Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated gaussian mixture model. Experimental results demonstrate that CCVR achieves state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10. We hope that our simple yet effective method can shed some light on the future research of federated learning with non-IID data.


No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data

Neural Information Processing Systems

A central challenge in training classification models in the real-world federated system is learning with non-IID data. To cope with this, most of the existing works involve enforcing regularization in local optimization or improving the model aggregation scheme at the server. Other works also share public datasets or synthesized samples to supplement the training of under-represented classes or introduce a certain level of personalization. Though effective, they lack a deep understanding of how the data heterogeneity affects each layer of a deep classification model. In this paper, we bridge this gap by performing an experimental analysis of the representations learned by different layers. Our observations are surprising: (1) there exists a greater bias in the classifier than other layers, and (2) the classification performance can be significantly improved by post-calibrating the classifier after federated training. Motivated by the above findings, we propose a novel and simple algorithm called Classifier Calibration with Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated gaussian mixture model. Experimental results demonstrate that CCVR achieves state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10. We hope that our simple yet effective method can shed some light on the future research of federated learning with non-IID data.


2e9f978b222a956ba6bdf427efbd9ab3-Supplemental.pdf

Neural Information Processing Systems

B.3 Derivations of Eq. (19) Similar to derivation above, we give the gradient with respect to weight vector w RM+, which is given by wDKL = w log Z(U,w) wEU,w (log pฮธ(X |z))T1N + wEU,w (log pฮธ(U |z))Tw . The learning rate of each stochastic gradient descent step is ฮณt t 1, where t {1,,T}denotes the iteration for optimization. We already report the t-SNE visualization of ByPE-VAE and standard VAE in Figure. Here we give more t-SNE visualization results. First, we randomly sample from ByPE-VAEs trained on different datasets, namely, MNIST, Fashion MNIST, and Celeba, as shown in Fig.7.



Assaying Out-Of-Distribution Generalization in Transfer Learning

Neural Information Processing Systems

Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e.g., calibration, adversarial robustness, algorithmic corruptions, invariance across shifts) were studied across different research programs resulting in different recommendations. While sharing the same aspirational goal, these approaches have never been tested under the same experimental conditions on real data. In this paper, we take a unified view of previous work, highlighting message discrepancies that we address empirically, and providing recommendations on how to measure the robustness of a model and how to improve it. To this end, we collect 172 publicly available dataset pairs for training and out-of-distribution evaluation of accuracy, calibration error, adversarial attacks, environment invariance, and synthetic corruptions.


Experiments and Additional Results

Neural Information Processing Systems

Note that f(x,c1,c2,) is strongly concave for any (x,c,c) Rd+2.1 2 Impact of the Local Steps: In this section, we run additional experiments to investigate the impact of the local steps K on the training performance. We run FSGDA and SAGDA over the hetergenous "a9a" [40] dataset with the regression model mentioned in Section 4. We fix the local step-size at 0.01, worker number at 100, and choose the number of local update rounds K from the discrete set {2,10,20}. This is due to the fact that the algorithm needs more communication round while K is small, which matches our Corollary 2 and Corollary 3. Impact of the Local Step-size: In this experiment, we choose the value of the local step-sizes from the discrete set {0.0001,0.001,0.01}and As shown in Figure 1(a) and Fig.6(a), larger local step-sizes lead to faster convergence rates. Impact of the Global Step-size: we choose the global step-sizes value from the discrete set {2,5,10} and fix worker number at 100, local update rounds at 10.



Online Lazy Gradient Descent is Universal on Strongly Convex Domains

Neural Information Processing Systems

We study Online Lazy Gradient Descent for optimisation on a strongly convex domain. The algorithm is known to achieve O( N) regret against adversarial opponents; here we show it is universal in the sense that it also achieves O(log N) expected regret against i.i.d opponents. This improves upon the more complex metaalgorithm of Huang et al [20] that only gets O( Nlog N) and O(log N) bounds. In addition we show that, unlike for the simplex, order bounds for pseudo-regret and expected regret are equivalent for strongly convex domains.