Causal Video Summarizer for Video Exploration
Huang, Jia-Hong, Yang, Chao-Han Huck, Chen, Pin-Yu, Brown, Andrew, Worring, Marcel
–arXiv.org Artificial Intelligence
Recently, video summarization has been proposed as a method to help video exploration. However, traditional video summarization models only generate a fixed video summary which is usually independent of user-specific needs and hence limits the effectiveness of video exploration. Multi-modal video summarization is one of the approaches utilized to address this issue. Multi-modal video summarization has a video input and a text-based query input. Hence, effective modeling of the interaction between a video input and text-based query is essential to multi-modal video summarization. In this work, a new causality-based method named Causal Video Summarizer (CVS) is proposed to effectively capture the interactive information between the video and query to tackle the task of multi-modal video summarization. The proposed method consists of a probabilistic encoder and a probabilistic decoder. Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective with the increase of +5.4% in accuracy and +4.92% increase of F 1- score, compared with the state-of-the-art method.
arXiv.org Artificial Intelligence
Jul-4-2023
- Country:
- Europe > Netherlands (0.14)
- North America > United States (0.14)
- Genre:
- Research Report > Promising Solution (0.34)
- Industry:
- Health & Medicine (0.69)
- Technology: