Is Label Smoothing Truly Incompatible with Knowledge Distillation: An Empirical Study

Shen, Zhiqiang, Liu, Zechun, Xu, Dejia, Chen, Zitian, Cheng, Kwang-Ting, Savvides, Marios

Apr-1-2021–arXiv.org Artificial Intelligence

This work aims to empirically clarify a recently discovered perspective that label smoothing is incompatible with knowledge distillation (Müller et al., 2019). We begin by introducing the motivation behind on how this incompatibility is raised, i.e., label smoothing erases relative information between teacher logits. We provide a novel connection on how label smoothing affects distributions of semantically similar and dissimilar classes. Then we propose a metric to quantitatively measure the degree of erased information in sample's representation. After that, we study its one-sidedness and imperfection of the incompatibility view through massive analyses, visualizations and comprehensive experiments on Image Classification, Binary Networks, and Neural Machine Translation. Finally, we broadly discuss several circumstances wherein label smoothing will indeed lose its effectiveness. Recently a large body of studies is focusing on exploring the underlying relationships between these two methods, for instance, Müller et al. (Müller et al., 2019) discovered that label smoothing could improve calibration implicitly but will hurt the effectiveness of knowledge distillation. Yuan et al. (Yuan et al., 2019) considered knowledge distillation as a dynamical form of label smoothing as it delivered a regularization effect in training. The recent study (Lukasik et al., 2020) further noticed label smoothing could help mitigate label noise, they showed that when distilling models from noisy data, the teacher with label smoothing is helpful.

knowledge distillation, machine translation, neural network, (17 more...)

arXiv.org Artificial Intelligence

Apr-1-2021

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.93)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Natural Language > Machine Translation (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found