Slimmed Asymmetrical Contrastive Learning and Cross Distillation for Lightweight Model Training 1 Supplementary Material

Neural Information Processing Systems 

In Section 3.2, we proposed the crossdistillation (XD) learning scheme. The distillation objective in Eq (10) is the inner decorrelation minimization between embeddings z and [ z]. In addition to the correlation-based distillation loss, we also investigate the negative logarithm(e.g, To avoid the unbalanced loss magnitude, the distillation loss is introduced as the regularization term controlled by the penalty level γ: L = LSACL(zA,zB)+γLCD (1) LCD = ( [ zA]logzA + [ zB]logzB)/2 (2) We empirically observe that the negative logarithm-based distillation loss failed to outperform the proposed cross-distillation loss LCD with inner-decorrelation minimization. As shown in the ImageNet-100 results below: Method Encoder # of Params (M) Linear Eval Acc.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found