self-1 distillation (SD) and label-smoothing (LS) as MAP insightful ([R2], [R3], [R4]), that relating accuracy to confidence

Neural Information Processing Systems 

We thank all reviewers for their constructive feedback! We address reviewers comments below, and will incorporate all feedback. This explains why SD outperforms LS. Please refer to our response to [R3] for discussion on CD. One can alternatively compute the variance of prediction confidence.