A Modular Approach for Multimodal Summarization of TV Shows

Jul-6-2024–arXiv.org Artificial Intelligence

In this paper we address the task of summarizing television shows, which touches key areas in AI research: complex reasoning, multiple modalities, and long narratives. We present a modular approach where separate components perform specialized sub-tasks which we argue affords greater flexibility compared to end-to-end methods. Our modules involve detecting scene boundaries, reordering scenes so as to minimize the number of cuts between different events, converting visual information to text, summarizing the dialogue in each scene, and fusing the scene summaries into a final summary for the entire episode. We also present a new metric, PRISMA (Precision and Recall EvaluatIon of Summary FActs), to measure both precision and recall of generated summaries, which we decompose into atomic facts. Tested on the recently released SummScreen3D dataset, our method produces higher quality summaries than comparison models, as measured with ROUGE and our new fact-based metric, and as assessed by human evaluators.

module, summarization, transcript, (16 more...)

arXiv.org Artificial Intelligence

Jul-6-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California (0.04)
  - New York
    - Richmond County > New York City (0.04)
    - Queens County > New York City (0.04)
    - New York County > New York City (0.04)
    - Kings County > New York City (0.04)
    - Bronx County > New York City (0.04)
  - New Mexico > Santa Fe County
    - Santa Fe (0.04)
  - Colorado > Denver County
    - Denver (0.04)
- Europe
  - Italy (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Romania > Sud - Muntenia Development Region
    - Giurgiu County > Giurgiu (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Croatia > Dubrovnik-Neretva County
    - Dubrovnik (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Singapore (0.04)
  - China > Hong Kong (0.04)

Genre:
- Research Report (1.00)

Industry:
- Media > Television (1.00)
- Leisure & Entertainment (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Machine Learning (1.00)
  - Representation & Reasoning (0.93)
  - Natural Language > Large Language Model (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found