VLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval
–Neural Information Processing Systems
VLC is to add a synthetic hard negative image generated from the synthetic text, resulting in two image-to-text retrieval examples (one for each image) and, more importantly, two text-to-image retrieval examples (one for each text).
Neural Information Processing Systems
Mar-27-2025, 03:22:13 GMT
- Country:
- Europe > Switzerland > Zürich > Zürich (0.14)
- Genre:
- Overview (0.68)
- Research Report > New Finding (0.46)
- Industry:
- Government (0.46)
- Information Technology (0.68)
- Technology: