Exploiting Feature Diversity for Make-up Temporal Video Grounding
Shu, Xiujun, Wen, Wei, Guo, Taian, He, Sunan, Wu, Chen, Qiao, Ruizhi
–arXiv.org Artificial Intelligence
This technical report presents the 3rd winning solution for MTVG, a new task introduced in the 4-th Person in Context (PIC) Challenge at ACM MM 2022. MTVG aims at localizing the temporal boundary of the step in an untrimmed video based on a textual description. The biggest challenge of this task is the fi ne-grained video-text semantics of make-up steps. However, current methods mainly extract video features using action-based pre-trained models. As actions are more coarse-grained than make-up steps, action-based features are not sufficient to provide fi ne-grained cues. To address this issue,we propose to achieve fi ne-grained representation via exploiting feature diversities. Specifically, we proposed a series of methods from feature extraction, network optimization, to model ensemble. As a result, we achieved 3rd place in the MTVG competition.
arXiv.org Artificial Intelligence
Aug-12-2022
- Country:
- Europe > Portugal
- North America > United States
- New York > New York County
- New York City (0.04)
- Texas > Sterling County (0.04)
- New York > New York County
- Genre:
- Research Report (0.40)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language (0.71)
- Vision (0.72)
- Information Technology > Artificial Intelligence