Supplemental: A Benchmark for Compositional Text-to-image Retrieval

Oct-9-2025, 01:29:58 GMT–Neural Information Processing Systems

GQA GQA has annotations of objects and attributes in images. We use this to construct queries like "square white plate". We train on the GQA train split (with the test unseen queries and corresponding images removed). Hence, we have around 67K training images and 27K queries. CLEVR On CLEVR, we test on 96 classes on 22,500 images.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Oct-9-2025, 01:29:58 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.28)

Industry:
- Government (0.46)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
Supplemental: A Benchmark for Compositional Text-to-image Retrieval

Similar Docs Excel Report more

Title	Similarity	Source
None found