Supplementary of VRS Bench: A Versatile Benchmark for Vision Language Understanding of Remote Sensing Images
–Neural Information Processing Systems
VRSBench consists of 29,614 remote sensing images with detailed captions, 52,472 object refers, 123,221 visual question-answer pairs. This section documents the dataset in accordance with best practices to ensure transparency, reproducibility, and ethical usage. Images_val.zip contains all raw images in the validation split. Model Evaluation: The dataset can serve as a benchmark for comparing different vision-language models' performance on a standardized set of tasks. These annotations undergo a manual review by human annotators.
Neural Information Processing Systems
Nov-13-2025, 08:45:51 GMT
- Country:
- North America > United States > North Dakota > Stark County (0.04)
- Industry:
- Technology:
- Information Technology > Artificial Intelligence
- Natural Language (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence