Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding Y ang Jin 1, Y ongzhi Li
–Neural Information Processing Systems
Spatio-Temporal video grounding (STVG) focuses on retrieving the spatiotemporal tube of a specific object depicted by a free-form textual expression. Existing approaches mainly treat this complicated task as a parallel frame-grounding problem and thus suffer from two types of inconsistency drawbacks: feature alignment inconsistency and prediction inconsistency .
Neural Information Processing Systems
Aug-18-2025, 08:06:58 GMT
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language (1.00)
- Vision (0.99)
- Information Technology > Artificial Intelligence