Supplementary Material for " Improved Feature Distillation via Projector Ensemble " 1 Ablation Studies

Neural Information Processing Systems 

In this section, we further investigate the effectiveness of the proposed method when the feature dimensions of the student and teacher are different. In our experiments, we find that simply initializing different projectors with different seeds and the default initialization method of linear layer in Pytorch is sufficient to yield good performance. Therefore, we stick to this strategy to make the proposed method as simple as possible. ResNet32x4-ResNet8x4 on CIFAR-100 and report the top-1 classification accuracy. The L2 distances between projectors with and without the regularization term is shown in Table 3. From some The generalization performance of networks distilled by different methods is shown in Table 4.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found