Video Summarization: Towards Entity-Aware Captions

Ayyubi, Hammad A., Liu, Tianqi, Nagrani, Arsha, Lin, Xudong, Zhang, Mingda, Arnab, Anurag, Han, Feng, Zhu, Yukun, Liu, Jialu, Chang, Shih-Fu

Dec-1-2023–arXiv.org Artificial Intelligence

Existing popular video captioning benchmarks and models deal with generic captions devoid of specific person, place or organization named entities. In contrast, news videos present a challenging setting where the caption requires such named entities for meaningful summarization. As such, we propose the task of summarizing news video directly to entity-aware captions. We also release a large-scale dataset, VIEWS (VIdeo NEWS), to support research on this task. Further, we propose a method that augments visual information from videos with context retrieved from external world knowledge to generate entity-aware captions. We demonstrate the effectiveness of our approach on three video captioning models. We also show that our approach generalizes to existing news image captions dataset. With all the extensive experiments and insights, we believe we establish a solid basis for future research on this challenging task.

caption, knowledge, video, (16 more...)

arXiv.org Artificial Intelligence

Dec-1-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - California > San Francisco County
    - San Francisco (0.14)
- Europe
  - Ukraine (0.04)
  - Portugal (0.04)
  - Spain > Galicia
    - Madrid (0.04)
  - Russia > Central Federal District
    - Moscow Oblast > Moscow (0.05)
  - Croatia > Dubrovnik-Neretva County
    - Dubrovnik (0.04)
- Asia
  - Afghanistan (0.14)
  - India (0.04)
  - Russia (0.04)
  - China > Tibet Autonomous Region (0.04)
  - Philippines (0.04)
  - Bangladesh > Dhaka Division
    - Dhaka District > Dhaka (0.04)
  - Pakistan > Punjab
    - Lahore Division > Lahore (0.04)
  - Middle East
    - Iran (0.14)
    - Syria (0.04)
    - Lebanon > Beirut Governorate
      - Beirut (0.04)
    - Iraq
      - Baghdad Governorate > Baghdad (0.04)
      - Najaf Governorate > Najaf (0.04)
  - South Korea > Gyeongsangbuk-do
    - Pohang (0.04)
- Africa
  - Liberia (0.14)
  - South Africa > Gauteng
    - Johannesburg (0.04)

Genre:
- Research Report (1.00)

Industry:
- Leisure & Entertainment (0.93)
- Health & Medicine > Consumer Health (0.93)
- Transportation (0.68)
- Media > News (0.68)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)
- Government
  - Military (1.00)
  - Regional Government
    - North America Government > United States Government (1.00)
    - Asia Government > Middle East Government
      - Iraq Government (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Vision (0.93)
  - Natural Language
    - Text Processing (1.00)
    - Large Language Model (0.71)