Reviews: Neural Ordinary Differential Equations

Neural Information Processing Systems 

Given the ever increasing importance of AD in both communities, adding to the range of scientific computing primitives for which frameworks such as autograd can efficiently compute derivatives through will hopefully spur more widespread use of gradient based learning and inference methods with ODE models and hopefully spur other frameworks with AD capability in the community such as Stan, TensorFlow and Pytorch to implement adjoint sensitivity methods. The specific suggested applications of the'ODE solver modelling primitive' in ODE-Nets, CNFs and L-ODEs are all interesting demonstrations of some of the computational and modelling advantages that come from using a continuous-time ODE mode; formulation, with in particular the memory savings possible by avoiding the need to compute all intermediate states by recomputing trajectories backwards through time being a possible major gain given that device memory is often currently a bottleneck. While'reversing' the integration to recompute the reverse trajectory is an appealing idea, it would have helped to have more discussion of when this would be expected to breakdown - for example it seems likely that highly chaotic dynamical systems would tend to be problematic as even small errors in the initial backwards steps could soon lead to very large divergences in the reversed trajectories compared to the forward ones. It seems like a useful sanity check in an implementation would be to compare the final state of the reversed trajectory to the initial state of the forward trajectory to check how closely they agree. The submission is generally very well written and presented with a clear expository style, with useful illustrative examples given in the experiments to support the claims made and well thought out figures which help to give visual intuitions about the methods and results.