MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

Open in new window