Review for NeurIPS paper: Ode to an ODE

Neural Information Processing Systems 

Weaknesses: While orthogonality constraint helps with vanishing-exploding gradients issue as shown in Lemma 4.1, it's not clear to me if it doesn't introduce some new issues. For example, by reducing the flexibility in how W can change, training might become slow or not converge to a point with near-minimum loss. Theorem 1 is proving convergence to a stationary point which of course need not have small loss. It's worth noting that Lemma 4.1 doesn't get used (as far as I can tell) in any other theoretical result. Thus, at least theoretically, the results in the paper can't be regarded as providing support for orthogonality constraints.