Exploiting Feature Diversity for Make-up Temporal Video Grounding

Shu, Xiujun, Wen, Wei, Guo, Taian, He, Sunan, Wu, Chen, Qiao, Ruizhi

Aug-12-2022–arXiv.org Artificial Intelligence

This technical report presents the 3rd winning solution for MTVG, a new task introduced in the 4-th Person in Context (PIC) Challenge at ACM MM 2022. MTVG aims at localizing the temporal boundary of the step in an untrimmed video based on a textual description. The biggest challenge of this task is the fi ne-grained video-text semantics of make-up steps. However, current methods mainly extract video features using action-based pre-trained models. As actions are more coarse-grained than make-up steps, action-based features are not sufficient to provide fi ne-grained cues. To address this issue,we propose to achieve fi ne-grained representation via exploiting feature diversities. Specifically, we proposed a series of methods from feature extraction, network optimization, to model ensemble. As a result, we achieved 3rd place in the MTVG competition.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Aug-12-2022

arXiv.org PDF

Add feedback

Country:
- Europe > Portugal
  - Lisbon > Lisbon (0.05)
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - Texas > Sterling County (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language (0.71)
  - Vision (0.72)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found