A Multi-level Alignment Training Scheme for Video-and-Language Grounding

Open in new window