Slimmed Asymmetrical Contrastive Learning and Cross Distillation for Lightweight Model Training 1 Supplementary Material

Neural Information Processing Systems 

In Section 3.2, we proposed the cross-distillation (XD) learning scheme. ImageNet-1K The encoders (MobileNet, EfficientNet, ResNet-50) are trained on ImageNet-1K with 100/200/300 epochs from scratch with the proposed method. We set the batch to 256 with a learning rate = 0.8. We employ the LARS optimizer with weight decay set to 1.5e-6. The hidden layer dimension of the projector is 4096.