Text-Visual Prompting for Efficient 2D Temporal Video Grounding

Open in new window