Localizing Moments in Long Video Via Multimodal Guidance

Open in new window