Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding Y ang Jin 1, Y ongzhi Li

Aug-18-2025, 08:06:58 GMT–Neural Information Processing Systems

Spatio-Temporal video grounding (STVG) focuses on retrieving the spatiotemporal tube of a specific object depicted by a free-form textual expression. Existing approaches mainly treat this complicated task as a parallel frame-grounding problem and thus suffer from two types of inconsistency drawbacks: feature alignment inconsistency and prediction inconsistency .

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Aug-18-2025, 08:06:58 GMT

Conferences PDF

Add feedback

Country:
- Asia > China
  - Heilongjiang Province > Daqing (0.04)
  - Beijing > Beijing (0.04)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning (1.00)
  - Vision (0.99)

Duplicate Docs Excel Report

Title
bc18c538d983cea434f9281148d43e1e-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found