Supplementary Material

Neural Information Processing Systems 

We used popular ResNet models viz. ResNet18 and ResNet50 models and MobileNetV2 as well as their skeptical variants to evaluate the efficacy of proposed hybrid distillation scheme. For the ResNet models we added the auxiliary classifiers (ACs) enhancements after the basic block layer number 2 and 3. For the MobileNetV2 variant these ACs are placed after stage 4 and stage 6 where a stage is a combination of linear bottleneck layers as defined by [1]. It is noteworthy that the skeptical models are more parameter-heavy due to the added parameter cost for the ACs.