One-for-All: Bridge the Gap of Heterogeneous Architectures in Knowledge Distillation Supplementary Material

Oct-9-2025, 12:34:15 GMT–Neural Information Processing Systems

As the current best MLP-based model achieves only 83.8% Consequently, we employ a ViT -B model with a top-1 accuracy of 86.53% as the teacher model for conducting cross-architecture distillation. Our OFA-KD framework exhibits significant enhancements in terms of top-1 accuracy when compared to models trained from scratch. The observed improvements range from 1.2% to 2.6%. To conduct the analysis, we select a batch of 128 samples from the ImageNet-1K validation set and collect model activations after the activation layers. More results of CKA analysis are provided in Figure 6.

artificial intelligence, international conference, machine learning, (12 more...)

Neural Information Processing Systems

Oct-9-2025, 12:34:15 GMT

Conferences PDF

Add feedback

Country:
- Asia > China > Beijing > Beijing (0.05)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (0.99)
  - Machine Learning > Neural Networks (0.31)

Duplicate Docs Excel Report

Title
fb8e5f198c7a5dcd48860354e38c0edc-Supplemental-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found