ImprovedFeature

Neural Information Processing Systems 

In this section, we further investigate the effectiveness of the proposed method when the feature dimensions of the student and teacher are different. In our experiments, we find that simply initializing different projectors with different seeds and the default initialization method of linear layer in Pytorch is sufficient to yield good performance. Therefore, we stick to this strategy to make the proposed method as simple as possible. Experimentalresults showthatmixing differentinitialization methods hasaslightimpact ontheperformance and is a potential way to further improve the distillation performance. We can see that the training times and memory usages of our method will slightly increase with theincrease ofthenumber ofprojectors.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found