Why does Knowledge Distillation Work? Rethink its Attention and Fidelity Mechanism

Open in new window