Modeling Caption Diversity in Contrastive Vision-Language Pretraining

Open in new window