Appendix for " Comprehensive Knowledge Distillation with Causal Intervention "

Neural Information Processing Systems 

CIFAR-10 is an image classification dataset. It contains 50,000 training images and 10,000 test images of 10 classes. We adopt the standard data augmentation strategy on CIFAR datasets, i.e., padding 4 pixels on each side of an image and randomly flipping it horizontally, and then cropping it to 32 32 size. CIFAR-100 comprises similar images to those in CIFAR-10, but has 100 classes. Tiny ImageNet is a subset of ImageNet.