ACVUBench: Audio-Centric Video Understanding Benchmark

Yang, Yudong, Zhuang, Jimin, Sun, Guangzhi, Tang, Changli, Li, Yixuan, Li, Peihan, Jiang, Yifan, Li, Wei, Ma, Zejun, Zhang, Chao

Mar-25-2025–arXiv.org Artificial Intelligence

Audio often serves as an auxiliary modality in video understanding tasks of audio-visual large language models (LLMs), merely assisting in the comprehension of visual information. However, a thorough understanding of videos significantly depends on auditory information, as audio offers critical context, emotional cues, and semantic meaning that visual data alone often lacks. This paper proposes an audio-centric video understanding benchmark (ACVUBench) to evaluate the video comprehension capabilities of multimodal LLMs with a particular focus on auditory information. Specifically, ACVUBench incorporates 2,662 videos spanning 18 different domains with rich auditory information, together with over 13k high-quality human annotated or validated question-answer pairs. Moreover, ACVUBench introduces a suite of carefully designed audio-centric tasks, holistically testing the understanding of both audio content and audio-visual interactions in videos. A thorough evaluation across a diverse range of open-source and proprietary multimodal LLMs is performed, followed by the analyses of deficiencies in audio-visual LLMs. Demos are available at https://github.com/lark-png/ACVUBench.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Mar-25-2025

arXiv.org PDF

Add feedback

Country:
- Africa > Central Africa (0.04)
- Europe
  - Norway (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report (0.50)

Industry:
- Leisure & Entertainment (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.96)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found