A Appendix A.1 Implementation of DIST

Aug-19-2025, 09:01:25 GMT–Neural Information Processing Systems

This section presents the implementation code of DIST, as shown in Figure 4. The purpose of these methods is to learn the similarity relationships between instances from teacher, e.g., the semantic spaces of instances with the same KD methods, which help us to achieve better performance especially when the student is trained with a stronger teacher. Here we conduct experiments to investigate the efficacy of our method with cosine similarity. As discussed in our main text, the matching functions such as KL divergence and MSE are used to match the outputs between student and teacher in KD.

artificial intelligence, dist, machine learning, (16 more...)

Neural Information Processing Systems

Aug-19-2025, 09:01:25 GMT

Conferences PDF

Add feedback

Industry:
- Education (0.58)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
da669dfd3c36c93905a17ddba01eef06-Supplemental-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found