VideoBERT: A Joint Model for Video and Language Representation Learning

Open in new window