Swapped Logit Distillation via Bi-level Teacher Alignment

Limantoro, Stephen Ekaputra, Lin, Jhe-Hao, Wang, Chih-Yu, Tsai, Yi-Lung, Shuai, Hong-Han, Huang, Ching-Chun, Cheng, Wen-Huang

May-26-2025–arXiv.org Artificial Intelligence

It has been mainstream that the teacher directly transfers knowledge to the student with its original distribution, which can possibly lead to incorrect predictions. In this article, we propose a logit-based distillation via swapped logit processing, namely Swapped Logit Distillation (SLD). SLD is proposed under two assumptions: (1) the wrong prediction occurs when the prediction label confidence is not the maximum; (2) the "natural" limit of probability remains uncertain as the best value addition to the target cannot be determined. To address these issues, we propose a swapped logit processing scheme. Through this approach, we find that the swap method can be effectively extended to teacher and student outputs, transforming into two teachers. We further introduce loss scheduling to boost the performance of two teachers' alignment. Extensive experiments on image classification tasks demonstrate that SLD consistently performs best among previous state-of-the-art methods. Codes are available at GitHub.

artificial intelligence, distillation, machine learning, (14 more...)

arXiv.org Artificial Intelligence

May-26-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Taiwan (0.15)

Genre:
- Research Report (1.00)

Industry:
- Education (0.92)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found