HourVideo: 1-Hour Video-Language Understanding
–Neural Information Processing Systems
Our dataset consists of a novel task suite comprising summarization, perception ( , ), visual reasoning ( , , , , ), and navigation ( , ) tasks. In stark contrast, human experts significantly outperform the state-of-the-art long-context multimodal model, Gemini Pro 1.5 (85.0\% vs. 37.3\%), highlighting a substantial gap in multimodal capabilities.
Neural Information Processing Systems
Dec-26-2025, 04:37:18 GMT
- Country:
- North America > United States > California > Santa Clara County > Palo Alto (0.08)
- Technology: