5ac8bb8a7d745102a978c5f8ccdb61b8-AuthorFeedback.pdf
–Neural Information Processing Systems
We thank the reviewers for extremely helpful suggestions. Below we discuss the most significant points. In each graph the predictions (in orange) were scaled by a single multiplicative constant to fit the measurements. This constant reflects the length of each gradient step (e.g., due to the learning rate and size of training set). We will explain that our results rely on lazy training induced by overparameterization. Chizat et al. provide We will discuss this interesting question.
Neural Information Processing Systems
Oct-9-2025, 14:10:05 GMT
- Technology: