Efficient Knowledge Distillation from Model Checkpoints

Neural Information Processing Systems 

In this paper, we observe that an intermediate model, i.e., a checkpoint in the middle of the training procedure, often serves as a better teacher compared to the fully converged model, although the former has much lower accuracy.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found