Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time

Neural Information Processing Systems 

They do not study how the population loss decreases during the first stage.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found