FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding

Jun-14-2026, 05:02:22 GMT–Neural Information Processing Systems

Multimodal Large Language Models (MLLMs) have shown impressive video content understanding capabilities but struggle with fine-grained motion comprehension. To comprehensively assess the motion understanding ability of existing MLLMs, we introduce FAVOR-Bench, which comprises 1,776 videos from both ego-centric and third-person perspectives and enables assessment through both close-ended and open-ended tasks.

artificial intelligence, large language model, natural language, (9 more...)

Neural Information Processing Systems

Jun-14-2026, 05:02:22 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)