Latent Neural ODEs with Sparse Bayesian Multiple Shooting
Iakovlev, Valerii, Yildiz, Cagatay, Heinonen, Markus, Lähdesmäki, Harri
–arXiv.org Artificial Intelligence
Training dynamic models, such as neural ODEs, on long trajectories is a hard problem that requires using various tricks, such as trajectory splitting, to make model training work in practice. These methods are often heuristics with poor theoretical justifications, and require iterative manual tuning. We propose a principled multiple shooting technique for neural ODEs that splits the trajectories into manageable short segments, which are optimised in parallel, while ensuring probabilistic control on continuity over consecutive segments. We derive variational inference for our shooting-based latent neural ODE models and propose amortized encodings of irregularly sampled trajectories with a transformer-based recognition network with temporal attention and relative positional encoding. We demonstrate efficient and stable training, and state-of-the-art performance on multiple largescale benchmark datasets. Dynamical systems, from biological cells to weather, evolve according to their underlying mechanisms, often described by differential equations. In data-driven system identification we aim to learn the rules governing a dynamical system by observing the system for a time interval [0, T ], and fitting a model of the underlying dynamics to the observations by gradient descent. Such optimisation suffers from the curse of length: complexity of the loss function grows with the length of the observed trajectory (Ribeiro et al., 2020). For even moderate T the loss landscape can become highly complex and gradient descent fails to produce a good fit (Metz et al., 2021). To alleviate this problem previous works resort to cumbersome heuristics, such as iterative training and trajectory splitting (Yildiz et al., 2019; Kochkov et al., 2021; HAN et al., 2022; Lienen & Günnemann, 2022). The optimal control literature has a long history of multiple shooting methods, where the trajectory fitting is split into piecewise segments that are easy to optimise, with constraints to ensure continuity across the segments (van Domselaar & Hemker, 1975; Bock & Plitt, 1984; Baake et al., 1992).
arXiv.org Artificial Intelligence
Feb-8-2023
- Country:
- North America > United States (0.14)
- Europe
- Finland (0.04)
- Italy > Sardinia (0.04)
- Germany > Baden-Württemberg
- Tübingen Region > Tübingen (0.04)
- Genre:
- Research Report (0.64)
- Technology: