Supplementary of VRS Bench: A Versatile Benchmark for Vision Language Understanding of Remote Sensing Images
–Neural Information Processing Systems
VRSBench is designed to facilitate the development and evaluation of vision-language models in remote sensing, providing a comprehensive set of annotations including detailed captions, visual grounding, and visual question answering. This section documents the dataset in accordance with best practices to ensure transparency, reproducibility, and ethical usage. Detailed descriptions for each folder or file are given below. Images_val.zip contains all raw images in the validation split. VRSBench_EVAL_Cap.json contains all evaluation annotations for the captioning task in standard JSON format. VRSBench_EVAL_referring.json contains all evaluation annotations for the visual grounding task in standard JSON format. Advancing the state-of-the-art in remote sensing image analysis by providing a rich dataset that supports multiple tasks.
Neural Information Processing Systems
May-28-2025, 08:17:28 GMT
- Industry:
- Technology: