Dual-Stream Transformer for Generic Event Boundary Captioning

Gu, Xin, Ye, Hanhua, Chen, Guang, Wang, Yufei, Zhang, Libo, Wen, Longyin

Mar-24-2023–arXiv.org Artificial Intelligence

GEBC requires the captioning model to have a Faster-RCNN [9] is utilized to extract region of interest of comprehension of instantaneous status changes around the given videos. Additionally, we utilize the "types of boundary" given video boundary, which makes it much more challenging labels as the language-modality input to help the model than conventional video captioning task. In this paper, a generate more accurate descriptions for boundaries. Dual-Stream Transformer with improvements on both video In order to learn discriminative representations for video content encoding and captions generation is proposed: (1) boundaries, the extracted multi-modal features are input We utilize three pre-trained models to extract the video features into our especially designed Dual-Stream Transformer.

artificial intelligence, boundary, machine learning, (12 more...)

arXiv.org Artificial Intelligence

Mar-24-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Vision (0.95)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found