VLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval

Neural Information Processing Systems 

VLC is to add a synthetic hard negative image generated from the synthetic text, resulting in two image-to-text retrieval examples (one for each image) and, more importantly, two text-to-image retrieval examples (one for each text).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found