TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM

Wang, Ye, Xu, Boshen, Yue, Zihao, Xiao, Zihan, Wang, Ziheng, Zhang, Liang, Yang, Dingyi, Wang, Wenxuan, Jin, Qin

Mar-17-2025–arXiv.org Artificial Intelligence

We introduce TimeZero, a reasoning-guided LVLM designed for the temporal video grounding (TVG) task. This task requires precisely localizing relevant video segments within long videos based on a given language query. TimeZero tackles this challenge by extending the inference process, enabling the model to reason about video-language relationships solely through reinforcement learning. To evaluate the effectiveness of TimeZero, we conduct experiments on two benchmarks, where TimeZero achieves state-of-the-art performance on Charades-STA. Code is available at https://github.com/www-Ye/TimeZero.

large language model, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

Mar-17-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.28)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (1.00)