AITopics | vrsbench

VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding

Neural Information Processing SystemsMar-17-2026, 21:00:57 GMT

We introduce a new benchmark designed to advance the development of general-purpose, large-scale vision-language models for remote sensing images. Although several vision-language datasets in remote sensing have been proposed to pursue this goal, existing datasets are typically tailored to single tasks, lack detailed object information, or suffer from inadequate quality control. Exploring these improvement opportunities, we present a Versatile vision-language Benchmark for Remote Sensing image understanding, termed VRSBench. This benchmark comprises 29,614 images, with 29,614 human-verified detailed captions, 52,472 object references, and 123,221 question-answer pairs.

artificial intelligence, name change, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

05b7f821234f66b78f99e7803fffa78a-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-7-2026, 09:45:39 GMT

annotation, category, dataset, (16 more...)

Neural Information Processing Systems

Industry:

Leisure & Entertainment > Sports (0.69)
Transportation (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

05b7f821234f66b78f99e7803fffa78a-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-9-2025, 17:36:21 GMT

annotation, category, dataset, (16 more...)

Neural Information Processing Systems

Industry:

Leisure & Entertainment > Sports (0.69)
Transportation (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.99)
Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding

Neural Information Processing SystemsMay-26-2025, 15:18:43 GMT

We introduce a new benchmark designed to advance the development of general-purpose, large-scale vision-language models for remote sensing images. Although several vision-language datasets in remote sensing have been proposed to pursue this goal, existing datasets are typically tailored to single tasks, lack detailed object information, or suffer from inadequate quality control. Exploring these improvement opportunities, we present a Versatile vision-language Benchmark for Remote Sensing image understanding, termed VRSBench. This benchmark comprises 29,614 images, with 29,614 human-verified detailed captions, 52,472 object references, and 123,221 question-answer pairs. We further evaluated state-of-the-art models on this benchmark for three vision-language tasks: image captioning, visual grounding, and visual question answering.

versatile vision-language benchmark dataset, vision-language model, vrsbench

Neural Information Processing Systems

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (1.00)

Technology: Information Technology > Artificial Intelligence > Vision > Image Understanding (0.70)

Add feedback