Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes