Logit-Based Losses Limit the Effectiveness of Feature Knowledge Distillation

Open in new window