Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets

Open in new window