VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

Open in new window