Goto

Collaborating Authors

 anodev2





ANODEV2: A Coupled Neural ODE Framework

Neural Information Processing Systems

It has been observed that residual networks can be viewed as the explicit Euler discretization of an Ordinary Differential Equation (ODE). This observation motivated the introduction of so-called Neural ODEs, in which other discretization schemes and/or adaptive time stepping techniques can be used to improve the performance of residual networks. Here, we propose \OURS, which extends this approach by introducing a framework that allows ODE-based evolution for both the weights and the activations, in a coupled formulation. Such an approach provides more modeling flexibility, and it can help with generalization performance. We present the formulation of \OURS, derive optimality conditions, and implement the coupled framework in PyTorch.





ANODEV2: A Coupled Neural ODE Framework

Neural Information Processing Systems

It has been observed that residual networks can be viewed as the explicit Euler discretization of an Ordinary Differential Equation (ODE). This observation motivated the introduction of so-called Neural ODEs, in which other discretization schemes and/or adaptive time stepping techniques can be used to improve the performance of residual networks. Here, we propose \OURS, which extends this approach by introducing a framework that allows ODE-based evolution for both the weights and the activations, in a coupled formulation. Such an approach provides more modeling flexibility, and it can help with generalization performance. We present the formulation of \OURS, derive optimality conditions, and implement the coupled framework in PyTorch.


Review for NeurIPS paper: Ode to an ODE

Neural Information Processing Systems

Weaknesses: While orthogonality constraint helps with vanishing-exploding gradients issue as shown in Lemma 4.1, it's not clear to me if it doesn't introduce some new issues. For example, by reducing the flexibility in how W can change, training might become slow or not converge to a point with near-minimum loss. Theorem 1 is proving convergence to a stationary point which of course need not have small loss. It's worth noting that Lemma 4.1 doesn't get used (as far as I can tell) in any other theoretical result. Thus, at least theoretically, the results in the paper can't be regarded as providing support for orthogonality constraints.


Reviews: ANODEV2: A Coupled Neural ODE Framework

Neural Information Processing Systems

The PDE-inspired formulation of coupled ODE is very interesting and can enable utilization of decades of progress in efficiently solving particular classes of coupled equations, in deep learning applications. This is a very exciting connection discovered by the authors. The central contribution of modeling weight evolution using ODEs hinges on the mentioned problem of neural ODEs exhibiting inaccuracy while recomputing activations. It appears a previous paper first reported this issue. The reviewer is not convinced about this problem.