VLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval

Open in new window