Video Captioning with Guidance of Multimodal Latent Topics