Slimmed Asymmetrical Contrastive Learning and Cross Distillation for Lightweight Model Training 1 Supplementary Material
–Neural Information Processing Systems
In Section 3.2, we proposed the cross-distillation (XD) learning scheme. ImageNet-1K The encoders (MobileNet, EfficientNet, ResNet-50) are trained on ImageNet-1K with 100/200/300 epochs from scratch with the proposed method. We set the batch to 256 with a learning rate = 0.8. We employ the LARS optimizer with weight decay set to 1.5e-6. The hidden layer dimension of the projector is 4096.
Neural Information Processing Systems
Nov-18-2025, 22:34:08 GMT
- Technology: