Goto

Collaborating Authors

 directional convergence


Directional convergence and alignment in deep learning

Neural Information Processing Systems

The above theories, with finite width networks, usually require the weights to stay close to initialization in certain norms. By contrast, practitioners run their optimization methods as long as their computational budget allows [Shallue et al., 2018], and if the data can be perfectly classified, the


Directional convergence and alignment in deep learning

Neural Information Processing Systems

The above theories, with finite width networks, usually require the weights to stay close to initialization in certain norms. By contrast, practitioners run their optimization methods as long as their computational budget allows [Shallue et al., 2018], and if the data can be perfectly classified, the


We thank the reviewers for their comments and time

Neural Information Processing Systems

We thank the reviewers for their comments and time. We will address these comments in our revisions. We agree that a discrete-time analysis is essential. Work section that this question is tricky, and has stymied many mathematicians. As discussed at the end of Section 1.1 We surveyed and cited recent work, e.g., (Davis et al., Please refer to lines 2-9 above for more details.







Directional convergence and alignment in deep learning

Neural Information Processing Systems

The above theories, with finite width networks, usually require the weights to stay close to initialization in certain norms. By contrast, practitioners run their optimization methods as long as their computational budget allows [Shallue et al., 2018], and if the data can be perfectly classified, the