Supplementary of VRS Bench: A Versatile Benchmark for Vision Language Understanding of Remote Sensing Images

Neural Information Processing Systems 

VRSBench consists of 29,614 remote sensing images with detailed captions, 52,472 object refers, 123,221 visual question-answer pairs. This section documents the dataset in accordance with best practices to ensure transparency, reproducibility, and ethical usage. Images_val.zip contains all raw images in the validation split. Model Evaluation: The dataset can serve as a benchmark for comparing different vision-language models' performance on a standardized set of tasks. These annotations undergo a manual review by human annotators.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found