An Experimental Study on Generating Plausible Textual Explanations for Video Summarization

Eleftheriadis, Thomas, Apostolidis, Evlampios, Mezaris, Vasileios

Oct-1-2025–arXiv.org Artificial Intelligence

For the needs of this study, we extend an existing framework for multigranular explanation of video summarization by integrating a SOT A Large Multimodal Model (LLaV A-OneVision) and prompting it to produce natural language descriptions of the obtained visual explanations. Following, we focus on one of the most desired characteristics for explainable AI, the plausibility of the obtained explanations that relates with their alignment with the humans' reasoning and expectations. Using the extended framework, we propose an approach for evaluating the plausibility of visual explanations by quantifying the semantic overlap between their textual descriptions and the textual descriptions of the corresponding video summaries, with the help of two methods for creating sentence embeddings (SBERT, SimCSE). Based on the extended framework and the proposed plausibility evaluation approach, we conduct an experimental study using a SOT A method (CA-SUM) and two datasets (SumMe, TVSum) for video summarization, to examine whether the more faithful explanations are also the more plausible ones, and identify the most appropriate approach for generating plausible textual explanations for video summarization.

explanation, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Oct-1-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Greece (0.14)

Genre:
- Research Report
  - New Finding (0.85)
  - Experimental Study (0.71)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language > Explanation & Argumentation (0.88)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found