Which priors matter? Benchmarking models for learning latent dynamics