Knowledge Distillation from A Stronger Teacher Tao Huang 1,2 Shan You 1 Fei Wang 3 Chen Qian

Neural Information Processing Systems 

We empirically find that the discrepancy of predictions between the student and a stronger teacher may tend to be fairly severer.