Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget

Open in new window