c76e4b2fa54f8506719a5c0dc14c2eb9-AuthorFeedback.pdf

Neural Information Processing Systems 

As the reviewer mentions, prior work oftenassumesdirectional convergence and alignment, but neither indicates a4 possible proof, nor even provides conclusive evidence. We agree that a discrete-time analysis is essential. We touched upon this14 inour"ConcludingRemarks", butwillexpand thematerial; forinstance, onecaneasilyadaptouranalysis tohandle15 extremely small step sizes, but handling a practical choice is much more challenging. Briefly,on the empirical side, we point the reviewer tothe large-scale experiment we cited, by Shallue et30 al. (2018), which finds that evenuncommonly largeamounts oftraining donotseem tohurt generalization. Regarding "it seems not hard to show42 the initialization is close to an optimal classifier", firstly we stress this is an orthogonal concern to our directional43 convergence result, which ensures gradient floweventually stabilizes.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found