i) Training phaseii) Evaluation phase WSCMR Query-video pair WSCMRTest-Trivial Novel-Words Novel-Composition VMRFine-grainedtimestamps Query-video pair VMRTest-Trivial
–Neural Information Processing Systems
With the exponential growth of video content, aiming at localizing relevant video moments based on natural language queries, video moment retrieval (VMR) has gained significant attention. Existing weakly supervised VMR methods focus on designing various feature modeling and modal interaction modules to alleviate the reliance on precise temporal annotations. However, these methods have poor generalization capabilities on compositional queries with novel syntactic structures or vocabulary in real-world scenarios. To this end, we propose a new task: weakly supervised compositional moment retrieval (WSCMR). This task trains models using only video-query pairs without precise temporal annotations, while enabling generalization to complex compositional queries.
Neural Information Processing Systems
Jun-22-2026, 14:28:00 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Overview (0.67)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Representation & Reasoning (1.00)
- Natural Language (1.00)
- Machine Learning (1.00)
- Information Technology > Artificial Intelligence