Long-FormVideo-LanguagePre-Trainingwith MultimodalTemporalContrastiveLearning

Neural Information Processing Systems 

Large-scale video-language pre-training has shown significant improvement in video-language understanding tasks.