Efficient Knowledge Distillation from Model Checkpoints
–Neural Information Processing Systems
In this paper, we observe that an intermediate model, i.e., a checkpoint in the middle of the training procedure, often serves as a better teacher compared to the fully converged model, although the former has much lower accuracy.
Neural Information Processing Systems
Dec-27-2025, 15:56:16 GMT