Efficient Knowledge Distillation from Model Checkpoints

Dec-27-2025, 15:56:16 GMT–Neural Information Processing Systems

In this paper, we observe that an intermediate model, i.e., a checkpoint in the middle of the training procedure, often serves as a better teacher compared to the fully converged model, although the former has much lower accuracy.

information, intermediate model, teacher model, (15 more...)

Neural Information Processing Systems

Dec-27-2025, 15:56:16 GMT

Conferences PDF

Add feedback

Country:
- Asia > China > Beijing > Beijing (0.04)

Genre:
- Research Report (0.46)

Industry:
- Education (0.96)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Duplicate Docs Excel Report

Title
Efficient Knowledge Distillation from Model Checkpoints

Similar Docs Excel Report more

Title	Similarity	Source
None found