AITopics | Statistical Learning

A central challenge in training classification models in the real-world federated system is learning with non-IID data. To cope with this, most of the existing works involve enforcing regularization in local optimization or improving the model aggregation scheme at the server. Other works also share public datasets or synthesized samples to supplement the training of under-represented classes or introduce a certain level of personalization. Though effective, they lack a deep understanding of how the data heterogeneity affects each layer of a deep classification model. In this paper, we bridge this gap by performing an experimental analysis of the representations learned by different layers. Our observations are surprising: (1) there exists a greater bias in the classifier than other layers, and (2) the classification performance can be significantly improved by post-calibrating the classifier after federated training. Motivated by the above findings, we propose a novel and simple algorithm called Classifier Calibration with Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated gaussian mixture model. Experimental results demonstrate that CCVR achieves state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10. We hope that our simple yet effective method can shed some light on the future research of federated learning with non-IID data.

artificial intelligence, classifier, machine learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data

Neural Information Processing SystemsApr-25-2026, 08:11:41 GMT

A central challenge in training classification models in the real-world federated system is learning with non-IID data. To cope with this, most of the existing works involve enforcing regularization in local optimization or improving the model aggregation scheme at the server. Other works also share public datasets or synthesized samples to supplement the training of under-represented classes or introduce a certain level of personalization. Though effective, they lack a deep understanding of how the data heterogeneity affects each layer of a deep classification model. In this paper, we bridge this gap by performing an experimental analysis of the representations learned by different layers. Our observations are surprising: (1) there exists a greater bias in the classifier than other layers, and (2) the classification performance can be significantly improved by post-calibrating the classifier after federated training. Motivated by the above findings, we propose a novel and simple algorithm called Classifier Calibration with Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated gaussian mixture model. Experimental results demonstrate that CCVR achieves state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10. We hope that our simple yet effective method can shed some light on the future research of federated learning with non-IID data.

artificial intelligence, classifier, machine learning, (14 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

2e9f978b222a956ba6bdf427efbd9ab3-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 07:58:20 GMT

B.3 Derivations of Eq. (19) Similar to derivation above, we give the gradient with respect to weight vector w RM+, which is given by wDKL = w log Z(U,w) wEU,w (log pθ(X |z))T1N + wEU,w (log pθ(U |z))Tw . The learning rate of each stochastic gradient descent step is γt t 1, where t {1,,T}denotes the iteration for optimization. We already report the t-SNE visualization of ByPE-VAE and standard VAE in Figure. Here we give more t-SNE visualization results. First, we randomly sample from ByPE-VAEs trained on different datasets, namely, MNIST, Fashion MNIST, and Celeba, as shown in Fig.7.

artificial intelligence, fashion mnist, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

2e9f978b222a956ba6bdf427efbd9ab3-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 07:58:17 GMT

artificial intelligence, bype-v ae, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.47)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Assaying Out-Of-Distribution Generalization in Transfer Learning

Neural Information Processing SystemsApr-25-2026, 07:58:05 GMT

Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e.g., calibration, adversarial robustness, algorithmic corruptions, invariance across shifts) were studied across different research programs resulting in different recommendations. While sharing the same aspirational goal, these approaches have never been tested under the same experimental conditions on real data. In this paper, we take a unified view of previous work, highlighting message discrepancies that we address empirically, and providing recommendations on how to measure the robustness of a model and how to improve it. To this end, we collect 172 publicly available dataset pairs for training and out-of-distribution evaluation of accuracy, calibration error, adversarial attacks, environment invariance, and synthetic corruptions.

artificial intelligence, machine learning, robustness, (19 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology > Security & Privacy (0.34)
Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Experiments and Additional Results

Neural Information Processing SystemsApr-25-2026, 07:56:51 GMT

Note that f(x,c1,c2,) is strongly concave for any (x,c,c) Rd+2.1 2 Impact of the Local Steps: In this section, we run additional experiments to investigate the impact of the local steps K on the training performance. We run FSGDA and SAGDA over the hetergenous "a9a" [40] dataset with the regression model mentioned in Section 4. We fix the local step-size at 0.01, worker number at 100, and choose the number of local update rounds K from the discrete set {2,10,20}. This is due to the fact that the algorithm needs more communication round while K is small, which matches our Corollary 2 and Corollary 3. Impact of the Local Step-size: In this experiment, we choose the value of the local step-sizes from the discrete set {0.0001,0.001,0.01}and As shown in Figure 1(a) and Fig.6(a), larger local step-sizes lead to faster convergence rates. Impact of the Global Step-size: we choose the global step-sizes value from the discrete set {2,5,10} and fix worker number at 100, local update rounds at 10.

artificial intelligence, machine learning, xfi, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

2f13806d6580db60d9d7d6f89ba529ca-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 07:56:48 GMT

artificial intelligence, arxiv preprint arxiv, machine learning, (13 more...)

Neural Information Processing Systems

Country: North America > United States > Ohio (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.74)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Online Lazy Gradient Descent is Universal on Strongly Convex Domains

Neural Information Processing SystemsApr-25-2026, 07:56:36 GMT

We study Online Lazy Gradient Descent for optimisation on a strongly convex domain. The algorithm is known to achieve O( N) regret against adversarial opponents; here we show it is universal in the sense that it also achieves O(log N) expected regret against i.i.d opponents. This improves upon the more complex metaalgorithm of Huang et al [20] that only gets O( Nlog N) and O(log N) bounds. In addition we show that, unlike for the simplex, order bounds for pseudo-regret and expected regret are equivalent for strongly convex domains.

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country: