Review for NeurIPS paper: Optimizing Neural Networks via Koopman Operator Theory

Neural Information Processing Systems 

Additional Feedback: As noted above, one of the biggest drawbacks of this very interesting work at present is the very limited scope of the demonstrations. I believe this should be easy to address, and were this done I would feel comfortable increasing my score. It would also be useful to see more detailed empirical study regarding the choice of the window (t1-t2) used to collect the data to inform the operator approximation.) There are a couple of details that I would like to see to help improve reproducibility. In terms of related work, there are a couple of more tangential directions that come to mind where connections could potentially be made / that may be interesting for the authors to consider. There may be connections related to initial stages of gradient descent identifying subspaces in which most of the parameter evolution will occur (i.e. containing the lottery ticket weights).