Towards Multi Video Understanding Evaluation for LLMs
–Neural Information Processing Systems
The advent of Multimodal Large Language Models (MLLMs) has expanded AI capabilities to visual modalities, yet existing evaluation benchmarks remain limited to single-video understanding, overlooking the critical need for multi-video understanding in real-world scenarios (e.g., sports analytics and autonomous driving). To address this significant gap, we introduce MVU-Eval, the first comprehensive benchmark for evaluating Multi-Video Understanding for MLLMs.
Neural Information Processing Systems
Jun-17-2026, 04:04:53 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Information Technology (1.00)
- Transportation > Ground
- Road (0.88)
- Technology: