MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs (Supplementary Material)

Jun-17-2026, 04:04:56 GMT–Neural Information Processing Systems

In this section, we introduce the construction pipeline for generating MVU-Eval QA pairs based on2 each data source.3 These questions include: (1) Object Recognition, (2)8 Spatial Understanding, (3) Counting, (4) Knowledge-intensive Reasoning, and (5) Temporal9 Reasoning. These generated questions, answers, and candidate choices are manually checked by10 humans. Pipelines for constructing video pairs are slightly different across datasets.11 By default, 2-6 videos are randomly sampled, regardless of their labels.

artificial intelligence, large language model, natural language, (17 more...)

Neural Information Processing Systems

Jun-17-2026, 04:04:56 GMT

Conferences PDF

Add feedback

Industry:
- Education (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.51)
  - Vision > Video Understanding (0.40)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found