Multi-head Knowledge Distillation for Model Compression