One-for-All: Bridge the Gap of Heterogeneous Architectures in Knowledge Distillation Supplementary Material
–Neural Information Processing Systems
As the current best MLP-based model achieves only 83.8% Consequently, we employ a ViT -B model with a top-1 accuracy of 86.53% as the teacher model for conducting cross-architecture distillation. Our OFA-KD framework exhibits significant enhancements in terms of top-1 accuracy when compared to models trained from scratch. The observed improvements range from 1.2% to 2.6%. To conduct the analysis, we select a batch of 128 samples from the ImageNet-1K validation set and collect model activations after the activation layers. More results of CKA analysis are provided in Figure 6.
Neural Information Processing Systems
Oct-9-2025, 12:34:15 GMT
- Technology: