Shadow Knowledge Distillation Bridging and Online Knowledge Transfer

Apr-24-2026, 08:36:44 GMT–Neural Information Processing Systems

Knowledge distillation can be generally divided into offline and online categories according to whether teacher model is pre-trained and persistent during the distillation process. Offline distillation can employ existing models yet always demonstrates inferior performance than online ones. In this paper, we first empirically show that the essential factor for their performance gap lies in the reversed distillation from student to teacher, rather than the training fashion. Offline distillation can achieve competitive performance gain by fine-tuning pre-trained teacher to adapt student with such reversed distillation. However, this fine-tuning process still costs lots of training budgets.

distillation, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Apr-24-2026, 08:36:44 GMT

Conferences PDF

Add feedback

Genre:
- Research Report (0.46)

Industry:
- Education (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (0.94)
  - Machine Learning > Neural Networks (0.94)
  - Natural Language (0.67)

Duplicate Docs Excel Report

Title
Shadow Knowledge Distillation: Bridging Offline and Online Knowledge Transfer Lujun Li

Similar Docs Excel Report more

Title	Similarity	Source
None found