Learning Intuitive Physics with Multimodal Generative Models