Exploring Inter-Channel Correlation for Diversity-preserved KnowledgeDistillation
Liu, Li, Huang, Qingle, Lin, Sihao, Xie, Hongwei, Wang, Bing, Chang, Xiaojun, Liang, Xiaodan
–arXiv.org Artificial Intelligence
Knowledge Distillation has shown very promising ability in transferring learned representation from the larger model (teacher) to the smaller one (student). Despite many efforts, prior methods ignore the important role of retaining inter-channel correlation of features, leading to the lack of capturing intrinsic distribution of the feature space and sufficient diversity properties of features in the teacher network. To solve the issue, we propose the novel Inter-Channel Correlation for Knowledge Distillation (ICKD), with which the diversity and homology of the feature Figure 1: Illustration of inter-channel correlation. The space of the student network can align with that of channels orderly extracted from the second layer of the teacher network. The correlation between these two ResNet18 have been visualized. The channels denoted by channels is interpreted as diversity if they are irrelevant red boxes are homologous both perceptually and mathematically to each other, otherwise homology. Then the student is (e.g., inner-product), while the channels denoted by required to mimic the correlation within its own embedding orange boxes are diverse. We show the inter-channel correlation space. In addition, we introduce the grid-level interchannel can effectively measure that each channel is homologous correlation, making it capable of dense prediction or diverse to others, which further reflects the richness tasks.
arXiv.org Artificial Intelligence
Feb-8-2022