Review for NeurIPS paper: Weak Form Generalized Hamiltonian Learning

Neural Information Processing Systems 

Correctness: There aren't any explicit references to held out (test) data. Is that what's meant in Appendix D (ODE Model Comparison Metrics)? Are those 50 initial conditions different than the ones used for training? If so, are all reported metrics & figures about results from these 50 held-out initial conditions? My understanding is that Section 4.1 compares ways to learn a pendulum model with a fully connected neural network that is not concerned with learning an energy function.