Second-Order Neural ODE Optimizer

Dec-24-2025, 23:21:54 GMT–Neural Information Processing Systems

We propose a novel second-order optimization framework for training the emerging deep continuous-time models, specifically the Neural Ordinary Differential Equations (Neural ODEs). Since their training already involves expensive gradient computation by solving a backward ODE, deriving efficient second-order methods becomes highly nontrivial. Nevertheless, inspired by the recent Optimal Control (OC) interpretation of training deep networks, we show that a specific continuous-time OC methodology, called Differential Programming, can be adopted to derive backward ODEs for higher-order derivatives at the same O(1) memory cost. We further explore a low-rank representation of the second-order derivatives and show that it leads to efficient preconditioned updates with the aid of Kronecker-based factorization. The resulting method - named SNOpt - converges much faster than first-order baselines in wall-clock time, and the improvement remains consistent across various applications, e.g.

artificial intelligence, machine learning, proceedings, (4 more...)

Neural Information Processing Systems

Dec-24-2025, 23:21:54 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.37)