Distilling Knowledge via Intermediate Classifier Heads