5ac8bb8a7d745102a978c5f8ccdb61b8-AuthorFeedback.pdf

Neural Information Processing Systems 

We thank the reviewers for extremely helpful suggestions. Below we discuss the most significant points. In each graph the predictions (in orange) were scaled by a single multiplicative constant to fit the measurements. This constant reflects the length of each gradient step (e.g., due to the learning rate and size of training set). We will explain that our results rely on lazy training induced by overparameterization. Chizat et al. provide We will discuss this interesting question.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found