AITopics | undistillable class

Knowledge distillation (KD) can effectively compress neural networks by training a smaller network (student) to simulate the behavior of a larger one (teacher). A counter-intuitive observation is that a more expansive teacher does not make a better student, but the reasons for this phenomenon remain unclear. In this paper, we demonstrate that this is directly attributed to the presence of \textit{undistillable classes}: when trained with distillation, the teacher's knowledge of some classes is incomprehensible to the student model. We observe that while KD improves the overall accuracy, it is at the cost of the model becoming inaccurate in these undistillable classes. After establishing their widespread existence in state-of-the-art distillation methods, we illustrate their correlation with the capacity gap between teacher and student models. Finally, we present a simple Teach Less Learn More (TLLM) framework to identify and discard the undistillable classes during training.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Industry: Education (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

A Limitations and Potential Negative Social Impacts

Neural Information Processing SystemsAug-19-2025, 01:41:48 GMT

Our work investigates the "larger teacher, worse student" phenomena in knowledge However, we only discuss image classification. Therefore, we do not guarantee the validity of our observation on other tasks, i.e., object detection, In addition, these classes can be sensitive, i.e., gender We hope future work can completely resolve this issue. Since most of these method provides hyper-parameters for CIFAR100, we do not modify them. In Section 2.2 we use modified ResNet24 as student to perform KD on a ResNet56 teacher model. We have mentioned the existence of the undistillable classes in general to various methods, and Table 1 gives a comprehensive list of methods for which we studied.

artificial intelligence, machine learning, undistillable class, (15 more...)

Neural Information Processing Systems

Industry: Social Sector (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.70)
Information Technology > Artificial Intelligence > Vision (0.69)

Add feedback

cf5c369c1bc070361477008e3f5210ed-Paper-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 01:41:45 GMT

artificial intelligence, machine learning, undistillable class, (15 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Asia > China (0.04)

Genre: Research Report (0.93)

Industry: Education (0.72)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Teach Less, Learn More: On the Undistillable Classes in Knowledge Distillation

Neural Information Processing SystemsJan-18-2025, 22:47:10 GMT

Knowledge distillation (KD) can effectively compress neural networks by training a smaller network (student) to simulate the behavior of a larger one (teacher). A counter-intuitive observation is that a more expansive teacher does not make a better student, but the reasons for this phenomenon remain unclear. In this paper, we demonstrate that this is directly attributed to the presence of \textit{undistillable classes}: when trained with distillation, the teacher's knowledge of some classes is incomprehensible to the student model. We observe that while KD improves the overall accuracy, it is at the cost of the model becoming inaccurate in these undistillable classes. After establishing their widespread existence in state-of-the-art distillation methods, we illustrate their correlation with the capacity gap between teacher and student models.

knowledge distillation, student model, undistillable class

Neural Information Processing Systems

Industry: Education (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback