SupplementaryMaterial-WikiDO: ANewBenchmarkEvaluatingCross-ModalRetrieval forVision-LanguageModels

Feb-18-2026, 20:13:20 GMT–Neural Information Processing Systems

This has been addressed in7 prior work [4, 3] by finetuning VLMs on a given corpus for a given task [5] and8 conducting zero-shot evaluations on a new corpus [7]. However, the mere use of an9 unseen corpus for evaluation does not imply it is OOD. Q1 What do the instances that comprise the dataset represent (e.g., documents, photos,24 people,countries)? Pleaseprovideadescription.26 (a) We provide 384k image-text pairs. Q3 Does the dataset contain all possible instances or is it a sample (not necessarily ran-36 dom) of instances from a larger set? If the dataset is a sample, then what is the larger37 set?

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Feb-18-2026, 20:13:20 GMT

Conferences PDF

Add feedback

Industry:
- Government (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (0.67)
  - Machine Learning (0.47)

Duplicate Docs Excel Report

Title
Supplementary Material - WikiDO: A New Benchmark Evaluating Cross-Modal Retrieval for Vision-Language Models A Datasheet for WikiDO dataset 1 A.1 Motivation

Similar Docs Excel Report more

Title	Similarity	Source
None found