Reviews: Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup

Neural Information Processing Systems 

This paper derives a coupled system of ODEs modelling this teacher-student setup. The authors provide an asymptotic analysis of the dynamics when only the first layer is trained, and generalization error increases with the size of the student network, and results when both layers are trained are also obtained. All reviewers agree that it is a good contribution.