AITopics | adaptive ensemble knowledge distillation

Collaborating Authors

adaptive ensemble knowledge distillation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space

Neural Information Processing SystemsDec-24-2025, 07:10:11 GMT

Distilling knowledge from an ensemble of teacher models is expected to have a more promising performance than that from a single one. Current methods mainly adopt a vanilla average rule, i.e., to simply take the average of all teacher losses for training the student network. However, this approach treats teachers equally and ignores the diversity among them. When conflicts or competitions exist among teachers, which is common, the inner compromise might hurt the distillation performance. In this paper, we examine the diversity of teacher models in the gradient space and regard the ensemble knowledge distillation as a multi-objective optimization problem so that we can determine a better optimization direction for the training of student network. Besides, we also introduce a tolerance parameter to accommodate disagreement among teachers. In this way, our method can be seen as a dynamic weighting method for each teacher in the ensemble.

adaptive ensemble knowledge distillation, disagree, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.60)

Add feedback

Review for NeurIPS paper: Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space

Neural Information Processing SystemsJan-26-2025, 14:02:36 GMT

The motivation of AE-KD is to encourage the optimization direction of the student guided equally by all the teachers. However, considering there are some weak teachers (low generalization accuracy) in the ensemble teacher pool, why are these weak teachers treated equally with other strong teachers in the gradient space? Intuitively, the guidance of student should favor those strong teachers, but keep away from the weak teachers. What is the difference between them? 3. How to optimize the weights \alpha_m in Eq. (11)? Is it end-to-end optimized together with the student?

adaptive ensemble knowledge distillation, gradient space, knowledge distillation, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.44)

Add feedback

Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space

Neural Information Processing SystemsOct-10-2024, 18:44:35 GMT

adaptive ensemble knowledge distillation, gradient space, student network, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.64)

Add feedback