TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives

Open in new window