FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
–Neural Information Processing Systems
Multimodal Large Language Models (MLLMs) have shown impressive video content understanding capabilities but struggle with fine-grained motion comprehension. To comprehensively assess the motion understanding ability of existing MLLMs, we introduce FAVOR-Bench, which comprises 1,776 videos from both ego-centric and third-person perspectives and enables assessment through both close-ended and open-ended tasks.
Neural Information Processing Systems
Jun-14-2026, 05:02:22 GMT
- Technology: