376c6b9ff3bedbbea56751a84fffc10c-Supplemental.pdf
–Neural Information Processing Systems
Does Knowledge Distillation Really Work? Here we briefly describe key implementation details to reproduce our experiments. Data augmentation details are given in A.1, followed by architecture details in A.2, and finally training details are provided in A.3. The reader is encouraged to consult the included code for closer inspection. A.1 Data augmentation procedures Some of the data augmentation procedures we consider attempt to generate data that is close to the train data distribution (standard augmentations, GAN, mixup). Others (random noise, out-of-domain data) produce data for distillation that the teacher would never encounter during normal supervised training.
Neural Information Processing Systems
Aug-14-2025, 04:07:41 GMT
- Country:
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- Genre:
- Research Report (0.48)
- Industry:
- Education (0.68)
- Technology: