Learning the Simplest Neural ODE

Okamoto, Yuji, Takeuchi, Tomoya, Sakemi, Yusuke

May-6-2025–arXiv.org Machine Learning

Because ODE models allow one to embed inherent properties--for example, Hamiltonian structure [4] or stability guarantees [5-7]--Neural ODEs often achieve accurate long-term forecasting. Moreover, treating the solution map of an ODE as a diffeomorphism has led to applications in generative modeling. Compared to normalizing flows [8], such continuous-time models can yield low-parameter, memory-efficient generators [9]. Gradient-based optimization is the de-facto standard for learning NN-defined dynamics, enabled by the adjoint method [1] for efficient gradient computation. Nonetheless, several issues arise: (i) Sensitivity of gradients to the sampling interval of time-series data, (ii) V ariation of convergence speed depending on NN initialization, (iii) Difficulty of tuning hyper-parameters such as leaning rate. These factors often cause training instability or gradient vanishing/explosion. Existing work addresses them heuristically by restricting the search space of dynamics [9] or by tailored initialization [10]. However, the fundamental question-- why is training Neural ODE hard?

artificial intelligence, gradient, machine learning, (11 more...)

arXiv.org Machine Learning

May-6-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Austria
  - Vienna (0.14)
- Asia > Japan
  - Honshū
    - Kantō > Chiba Prefecture
      - Chiba (0.04)
    - Kansai > Kyoto Prefecture
      - Kyoto (0.05)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (0.70)
  - Machine Learning > Statistical Learning
    - Gradient Descent (0.30)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found