MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Xinyu Fang
–Neural Information Processing Systems
The advent of large vision-language models (L VLMs) has spurred research into their applications in multi-modal contexts, particularly in video understanding.
Neural Information Processing Systems
Oct-10-2025, 11:52:39 GMT