Supplementary Material for " Improved Feature Distillation via Projector Ensemble " 1 Ablation Studies
–Neural Information Processing Systems
In this section, we further investigate the effectiveness of the proposed method when the feature dimensions of the student and teacher are different. In our experiments, we find that simply initializing different projectors with different seeds and the default initialization method of linear layer in Pytorch is sufficient to yield good performance. Therefore, we stick to this strategy to make the proposed method as simple as possible. ResNet32x4-ResNet8x4 on CIFAR-100 and report the top-1 classification accuracy. The L2 distances between projectors with and without the regularization term is shown in Table 3. From some The generalization performance of networks distilled by different methods is shown in Table 4.
Neural Information Processing Systems
Aug-14-2025, 18:38:46 GMT