Empower Words: DualGround for Structured Phrase and Sentence-Level Temporal Grounding

Neural Information Processing Systems 

Video Temporal Grounding (VTG) aims to localize temporal segments in long, untrimmed videos that align with a given natural language query.