What Makes a " Good " Data Augmentation in Knowledge Distillation - A Statistical Perspective

Neural Information Processing Systems 

Why do some DA schemes ( e.g., CutMix) inherently perform much better than