CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising

Open in new window