How Much Can CLIP Benefit Vision-and-Language Tasks?

Open in new window