TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives Maitreya Patel

Neural Information Processing Systems 

Contrastive Language-Image Pretraining (CLIP) models maximize the mutual information between textual and visual modalities to learn representations.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found