A Experiment Details

Neural Information Processing Systems 

Source code for the training pipeline, tasks, and models used in this work, is available as part of the supplementary material. We used the same Adam [48] optimizer for all our experiments and a learning rate of 0.001, and a batch size of 128. For solving the differential equations both during ground truth data generation as well as with the neural ODEs, we use the Tsitouras 5/4 Runge-Kutta (Tsit5) method from DifferentialEquations.jl [36]. A.1 Coupled Pendulum The coupled pendulum dynamics are defined as We train the MP-NODE on a dataset of 500 trajectories, each randomly initialized with state values between [ π/2, π/2] for the θ and [ 1, 1] for θ, with a time step of 0.1s and each trajectory 10s long. The dataset is normalized through Z-score normalization.