Should Under-parameterized Student Networks Copy or Average Teacher Weights?

Open in new window