UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos

Jun-23-2024–arXiv.org Artificial Intelligence

With the surge in the amount of video data, video summarization techniques, including visual-modal(VM) and textual-modal(TM) summarization, are attracting more and more attention. However, unimodal summarization inevitably loses the rich semantics of the video. In this paper, we focus on a more comprehensive video summarization task named Bimodal Semantic Summarization of Videos (BiSSV). Specifically, we first construct a large-scale dataset, BIDS, in (video, VM-Summary, TM-Summary) triplet format. Unlike traditional processing methods, our construction procedure contains a VM-Summary extraction algorithm aiming to preserve the most salient content within long videos. Based on BIDS, we propose a Unified framework UBiSS for the BiSSV task, which models the saliency information in the video and generates a TM-summary and VM-summary simultaneously. We further optimize our model with a list-wise ranking-based objective to improve its capacity to capture highlights. Lastly, we propose a metric, $NDCG_{MS}$, to provide a joint evaluation of the bimodal summary. Experiments show that our unified framework achieves better performance than multi-stage summarization pipelines. Code and data are available at https://github.com/MeiYutingg/UBiSS.

summarization, ubiss, video, (14 more...)

arXiv.org Artificial Intelligence

Jun-23-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Western Australia > Perth (0.04)
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - New Mexico > Bernalillo County
    - Albuquerque (0.04)
- Europe
  - Switzerland > Zürich
    - Zürich (0.14)
  - Netherlands > North Holland
    - Amsterdam (0.04)
- Asia
  - Thailand > Phuket
    - Phuket (0.05)
  - India > Karnataka
    - Bengaluru (0.04)
  - China > Beijing
    - Beijing (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology
  - Data Science (0.93)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Vision (0.96)
    - Representation & Reasoning (0.93)
    - Machine Learning > Neural Networks (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found