A Appendix

Oct-2-2025, 06:11:56 GMT–Neural Information Processing Systems

We first give a derivation on the equivalence of label smoothing regularization and Eq. 7. Evidently, the objective does not regularize confidence diversity. "Scale both" corresponds to the originally proposed distillation objective in which both teacher and Plots of test accuracy and ECE against amount of temperature scaling applied are shown in Figure 1. Firstly, we observe that models trained with student scaling have ECE almost identical to that of the teacher models. As a direct contrast, we see that the student models trained without student scaling perform much better in terms of calibration error in general over its teacher. This coupled effect could be the reason for the observed conflict between ECE and accuracy.

accuracy, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Oct-2-2025, 06:11:56 GMT

Conferences PDF

Add feedback

Genre:
- Research Report (0.48)

Industry:
- Education (0.54)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
1731592aca5fb4d789c4119c65c10b4b-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found