A Modular Approach for Multimodal Summarization of TV Shows
–arXiv.org Artificial Intelligence
In this paper we address the task of summarizing television shows, which touches key areas in AI research: complex reasoning, multiple modalities, and long narratives. We present a modular approach where separate components perform specialized sub-tasks which we argue affords greater flexibility compared to end-to-end methods. Our modules involve detecting scene boundaries, reordering scenes so as to minimize the number of cuts between different events, converting visual information to text, summarizing the dialogue in each scene, and fusing the scene summaries into a final summary for the entire episode. We also present a new metric, PRISMA (Precision and Recall EvaluatIon of Summary FActs), to measure both precision and recall of generated summaries, which we decompose into atomic facts. Tested on the recently released SummScreen3D dataset, our method produces higher quality summaries than comparison models, as measured with ROUGE and our new fact-based metric, and as assessed by human evaluators.
arXiv.org Artificial Intelligence
Jul-6-2024
- Country:
- Asia
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Belgium > Brussels-Capital Region
- North America > United States
- California (0.04)
- Colorado > Denver County
- Denver (0.04)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- New York
- Bronx County > New York City (0.04)
- Kings County > New York City (0.04)
- New York County > New York City (0.04)
- Queens County > New York City (0.04)
- Richmond County > New York City (0.04)
- Genre:
- Research Report (1.00)
- Industry:
- Leisure & Entertainment (1.00)
- Media > Television (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language > Large Language Model (0.47)
- Representation & Reasoning (0.93)
- Vision (1.00)
- Information Technology > Artificial Intelligence