Reviews: Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup
–Neural Information Processing Systems
This paper derives a coupled system of ODEs modelling this teacher-student setup. The authors provide an asymptotic analysis of the dynamics when only the first layer is trained, and generalization error increases with the size of the student network, and results when both layers are trained are also obtained. All reviewers agree that it is a good contribution.
Neural Information Processing Systems
Jan-27-2025, 01:19:56 GMT
- Technology: