WikiDO: A New Benchmark Evaluating Cross-Modal Retrieval for Vision-Language Models

Dec-27-2025, 14:38:00 GMT–Neural Information Processing Systems

Cross-modal (image-to-text and text-to-image) retrieval is an established task used in evaluation benchmarks to test the performance of vision-language models (VLMs).

artificial intelligence, natural language, proceedings, (13 more...)

Neural Information Processing Systems

Dec-27-2025, 14:38:00 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language (0.42)