Revisit the Power of Vanilla Knowledge Distillation: from Small Scale to Large Scale Supplementary Material

Apr-25-2026, 18:54:18 GMT–Neural Information Processing Systems

A.1 Details of "stronger recipe" In Table 1 of our main paper, we evaluate the impact of limited model capacity [1] and small-scale dataset by comparing the results of using "previous training recipe" and our "stronger recipe". We summarize the details of "stronger recipe" and present them in Table 13. Table 13: Stronger training strategy used for distillation. "B" and "C" represent strategies for training students on ImageNet-1K and CIFAR100, respectively. A.2 Numerical results In Figure 1 of our main paper, we present a comparison of performance gaps among vanilla KD and two logits-based baselines, i.e., DKD [2] and DIST [3], on two datasets of varying scales, to demonstrate the underestimation of vanilla KD on small-scale datasets.

artificial intelligence, dataset, machine learning, (15 more...)

Neural Information Processing Systems

Apr-25-2026, 18:54:18 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
204f828ba287fdecf41dd002e9a07d8c-Supplemental-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found