Shadow Knowledge Distillation Bridging and Online Knowledge Transfer
–Neural Information Processing Systems
Knowledge distillation can be generally divided into offline and online categories according to whether teacher model is pre-trained and persistent during the distillation process. Offline distillation can employ existing models yet always demonstrates inferior performance than online ones. In this paper, we first empirically show that the essential factor for their performance gap lies in the reversed distillation from student to teacher, rather than the training fashion. Offline distillation can achieve competitive performance gain by fine-tuning pre-trained teacher to adapt student with such reversed distillation. However, this fine-tuning process still costs lots of training budgets.
Neural Information Processing Systems
Apr-24-2026, 08:36:44 GMT
- Genre:
- Research Report (0.46)
- Industry:
- Education (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (0.94)
- Machine Learning > Neural Networks (0.94)
- Natural Language (0.67)
- Information Technology > Artificial Intelligence