Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning Y uchong Sun
–Neural Information Processing Systems
Large-scale video-language pre-training has shown significant improvement in video-language understanding tasks. Previous studies of video-language pre-training mainly focus on short-form videos (i.e., within 30 seconds) and sentences,
Neural Information Processing Systems
Aug-19-2025, 20:57:35 GMT