Should Under-parameterized Student Networks Copy or Average Teacher Weights?