Learning Multimodal Latent Dynamics for Human-Robot Interaction