Two Causally Related Needles in a Video Haystack
–Neural Information Processing Systems
Properly evaluating the ability of Video-Language Models (VLMs) to understand long videos remains a challenge. We propose a long-context video understanding benchmark, CAUSAL2NEEDLES, that assesses two crucial abilities insufficiently addressed by existing benchmarks: (1) extracting information from two separate locations (two needles) in a long video and understanding them jointly, and (2) modeling the world in terms of cause and effect in human behaviors.
Neural Information Processing Systems
Jun-23-2026, 02:59:57 GMT
- Country:
- Asia (0.28)
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Technology:
- Information Technology
- Communications (1.00)
- Artificial Intelligence
- Vision (1.00)
- Representation & Reasoning (1.00)
- Natural Language
- Large Language Model (1.00)
- Chatbot (0.71)
- Machine Learning > Neural Networks
- Deep Learning (0.96)
- Information Technology