How does a human being distinguish the motions of a piece of paper and a piece of cloth? A high-school physics teacher might answer that they are both tangentially inextensible but cloth cannot resist any bending force from the normal direction.
Knowledge distillation (KD) has been widely used to improve the test accuracy of a "student" network, by training it to mimic the soft probabilities of a trained