What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations
Liu, Dongqi, Whitehouse, Chenxi, Yu, Xi, Mahon, Louis, Saxena, Rohit, Zhao, Zheng, Qiu, Yifu, Lapata, Mirella, Demberg, Vera
–arXiv.org Artificial Intelligence
Transforming recorded videos into concise and accurate textual summaries is a growing challenge in multimodal learning. This paper introduces VISTA, a dataset specifically designed for video-to-text summarization in scientific domains. VISTA contains 18,599 recorded AI conference presentations paired with their corresponding paper abstracts. We benchmark the performance of state-of-the-art large models and apply a plan-based framework to better capture the structured nature of abstracts. Both human and automated evaluations confirm that explicit planning enhances summary quality and factual consistency. However, a considerable gap remains between models and human performance, highlighting the challenges of scientific video summarization.
arXiv.org Artificial Intelligence
Feb-26-2025
- Country:
- North America
- Dominican Republic (0.04)
- United States
- Washington > King County
- Seattle (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- California > San Diego County
- San Diego (0.04)
- Washington > King County
- Mexico > Mexico City
- Mexico City (0.04)
- Canada > Ontario
- Toronto (0.04)
- Europe
- Germany > Saarland (0.04)
- Belgium (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Italy > Tuscany
- Florence (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Asia
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- North America
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (0.93)
- Research Report
- Industry:
- Media (0.46)
- Technology: