Should Under-parameterized Student Networks Copy or Average Teacher Weights?

Neural Information Processing Systems 

In other words, we fit an under-parameterized "student" network with

Similar Docs  Excel Report  more

TitleSimilaritySource
None found