NeMo: Needle in a Montage for Video-Language Understanding

Hu, Zi-Yuan, Liang, Shuo, Zheng, Duo, Li, Yanyang, Tao, Yeyao, Huang, Shijia, Feng, Wei, Qin, Jia, Yu, Jianguang, Huang, Jing, Fang, Meng, Li, Yin, Wang, Liwei

Oct-14-2025–arXiv.org Artificial Intelligence

Inspired by the needle in a haystack test widely used by LLMs, we introduce a novel task of Ne edle in a Mo ntage (NeMo), designed to assess VideoLLMs' critical reasoning capabilities, including long-context recall and temporal grounding. To generate video question answering data for our task, we develop a scalable automated data generation pipeline that facilitates high-quality data synthesis. Built upon the proposed pipeline, we present NeMoBench, a video-language benchmark centered on our task. Specifically, our full set of NeMoBench features 31,378 automatically generated question-answer (QA) pairs from 13,486 videos with various durations ranging from seconds to hours. Experiments demonstrate that our pipeline can reliably and automatically generate high-quality evaluation data, enabling NeMoBench to be continuously updated with the latest videos. We evaluate 20 state-of-the-art models on our benchmark, providing extensive results and key insights into their capabilities and limitations.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Oct-14-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.50)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found