AITopics | ensemble distillation

Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning

Neural Information Processing SystemsJun-13-2026, 09:27:39 GMT

Model-Heterogeneous Federated Learning (Hetero-FL) has attracted growing attention for its ability to aggregate knowledge from heterogeneous models while keeping private data locally. To better aggregate knowledge from clients, ensemble distillation, as a widely used and effective technique, is often employed after global aggregation to enhance the performance of the global model. However, simply combining Hetero-FL and ensemble distillation does not always yield promising results and can make the training process unstable. The reason is that existing methods primarily focus on logit distillation, which, while being model-agnostic with softmax predictions, fails to compensate for the knowledge bias arising from heterogeneous models. To tackle this challenge, we propose a stable and efficient Feature Distillation for model-heterogeneous Federated learning, dubbed FedFD, that can incorporate aligned feature information via orthogonal projection to integrate knowledge from heterogeneous models better. Specifically, a new feature-based ensemble federated knowledge distillation paradigm is proposed. The global model on the server needs to maintain a projection layer for each client-side model architecture to align the features separately. Orthogonal techniques are employed to re-parameterize the projection layer to mitigate knowledge bias from heterogeneous models and thus maximize the distilled knowledge. Extensive experiments show that FedFD achieves superior performance compared to state-of-the-art methods.

artificial intelligence, machine learning, proceedings, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

18df51b97ccd68128e994804f3eccc87-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 15:34:32 GMT

communication round, distillation, feddf, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada (0.04)
(3 more...)

Genre: Research Report (0.46)

Industry:

Education (0.95)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining (0.93)

Add feedback

Ensemble Distillation for Robust Model Fusion in Federated Learning

Neural Information Processing SystemsDec-23-2025, 19:28:52 GMT

Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model while keeping the training data decentralized. In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side. However, directly averaging model parameters is only possible if all models have the same structure and size, which could be a restrictive constraint in many scenarios. In this work we investigate more powerful and more flexible aggregation schemes for FL. Specifically, we propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients. This knowledge distillation technique mitigates privacy risk and cost to the same extent as the baseline FL algorithms, but allows flexible aggregation over heterogeneous client models that can differ e.g. in size, numerical precision or structure. We show in extensive empirical experiments on various CV/NLP datasets (CIFAR-10/100, ImageNet, AG News, SST2) and settings (heterogeneous models/data) that the server model can be trained much faster, requiring fewer communication rounds than any existing FL technique so far.

ensemble distillation, federated learning, robust model fusion, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Credal Ensemble Distillation for Uncertainty Quantification

Wang, Kaizheng, Cuzzolin, Fabio, Moens, David, Hallez, Hans

arXiv.org Artificial IntelligenceNov-19-2025

Deep ensembles (DE) have emerged as a powerful approach for quantifying predictive uncertainty and distinguishing its aleatoric and epistemic components, thereby enhancing model robustness and reliability. However, their high computational and memory costs during inference pose significant challenges for wide practical deployment. To overcome this issue, we propose credal ensemble distillation (CED), a novel framework that compresses a DE into a single model, CREDIT, for classification tasks. Instead of a single softmax probability distribution, CREDIT predicts class-wise probability intervals that define a credal set, a convex set of probability distributions, for uncertainty quantification. Empirical results on out-of-distribution detection benchmarks demonstrate that CED achieves superior or comparable uncertainty estimation compared to several existing baselines, while substantially reducing inference overhead compared to DE.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

2511.13766

Country: Europe > Belgium (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

Ensemble Distillation for Robust Model Fusion in Federated Learning Tao Lin

Neural Information Processing SystemsOct-2-2025, 06:58:20 GMT

Federated Learning (FL) is a machine learning setting where many devices collab-oratively train a machine learning model while keeping the training data decentralized.

artificial intelligence, communication round, machine learning, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report (0.46)

Industry:

Education (0.95)
Information Technology > Security & Privacy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Ensemble Distillation for Robust Model Fusion in Federated Learning

Neural Information Processing SystemsMay-26-2025, 17:13:20 GMT

Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model while keeping the training data decentralized. In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side. However, directly averaging model parameters is only possible if all models have the same structure and size, which could be a restrictive constraint in many scenarios. In this work we investigate more powerful and more flexible aggregation schemes for FL. Specifically, we propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients. This knowledge distillation technique mitigates privacy risk and cost to the same extent as the baseline FL algorithms, but allows flexible aggregation over heterogeneous client models that can differ e.g. in size, numerical precision or structure.

artificial intelligence, federated learning, machine learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Review for NeurIPS paper: Ensemble Distillation for Robust Model Fusion in Federated Learning

Neural Information Processing SystemsJan-22-2025, 03:56:20 GMT

Strengths: This work manifests solid understanding of key requirements and challenges of federated learning, and thus presents a practical solution with significant improvements. The contribution of this paper is formulating a robust, efficient training scheme in FL with extensive results and analysis, which is relevant to the NeurIPS community. They provide sufficient justifications about why the additional computations are negligible in practice and why the reduced number of communication rounds and the ability to handle architecture heterogeneity of FedDF matter more. The authors analyzed its contribution from various angles including efficiency, utilizing heterogeneous computation resources of clients, robustness on the choice of distillation dataset, and handling heterogeneous client data by mitigating quality loss of batch normalization with different data distributions. The results are sensible and believable.

distillation dataset, ensemble distillation, robust model fusion, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.67)

Add feedback

Review for NeurIPS paper: Ensemble Distillation for Robust Model Fusion in Federated Learning

Neural Information Processing SystemsJan-22-2025, 03:56:13 GMT

I recommend this paper for acceptance. The paper is on an important and a timely topic and is above the quality bar necessary for acceptance. Although the reviewers had some concerns, the rebuttal clarified their most burning questions. I also thought that the more critical reviews were the less informed ones. Having said that, I strongly suggest to take all comments of the reviewers into account to improve the quality of the camera-ready version, mostly with respect to the organization, the clarity of the paper (including the description of the related work) and including the results provided in the rebuttal.

ensemble distillation, federated learning, robust model fusion, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Ensemble Distillation for Robust Model Fusion in Federated Learning

Neural Information Processing SystemsOct-9-2024, 15:40:32 GMT

Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model while keeping the training data decentralized. In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side. However, directly averaging model parameters is only possible if all models have the same structure and size, which could be a restrictive constraint in many scenarios. In this work we investigate more powerful and more flexible aggregation schemes for FL. Specifically, we propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients. This knowledge distillation technique mitigates privacy risk and cost to the same extent as the baseline FL algorithms, but allows flexible aggregation over heterogeneous client models that can differ e.g. in size, numerical precision or structure.

ensemble distillation, federated learning, robust model fusion, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Ensemble Distillation for Robust Model Fusion in Federated Learning

Lin, Tao, Kong, Lingjing, Stich, Sebastian U., Jaggi, Martin

arXiv.org Machine LearningOct-22-2020

Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model while keeping the training data decentralized. In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side. However, directly averaging model parameters is only possible if all models have the same structure and size, which could be a restrictive constraint in many scenarios. In this work we investigate more powerful and more flexible aggregation schemes for FL. Specifically, we propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients. This knowledge distillation technique mitigates privacy risk and cost to the same extent as the baseline FL algorithms, but allows flexible aggregation over heterogeneous client models that can differ e.g. in size, numerical precision or structure. We show in extensive empirical experiments on various CV/NLP datasets (CIFAR-10/100, ImageNet, AG News, SST2) and settings (heterogeneous models/data) that the server model can be trained much faster, requiring fewer communication rounds than any existing FL technique so far.

artificial intelligence, communication round, machine learning, (13 more...)

arXiv.org Machine Learning

2006.07242

Country: