End-to-end Generative Pretraining for Multimodal Video Captioning

Open in new window