Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding

Open in new window