How a student becomes a teacher: learning and forgetting through Spectral methods Lorenzo Giambagli

Neural Information Processing Systems 

The above scheme proves particularly relevant when the student network is overparameterized (namely, when larger layer sizes are employed) as compared to the underlying teacher network. Under these operating conditions, it is tempting to speculate that the student ability to handle the given task could be eventually stored in a sub-portion of the whole network.