RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Scenarios
Zhao, Fei, Lu, Chengqiang, Shen, Yufan, Wang, Qimeng, Qian, Yicheng, Zhang, Haoxin, Gao, Yan, Wu, Yi, Hu, Yao, Wu, Zhen, Xing, Shangyu, Dai, Xinyu
–arXiv.org Artificial Intelligence
While various multimodal multi-image evaluation datasets have been emerged, but these datasets are primarily based on English, and there has yet to be a Chinese multi-image dataset. To fill this gap, we introduce RealBench, the first Chinese multimodal multi-image dataset, which contains 9393 samples and 69910 images. RealBench distinguishes itself by incorporating real user-generated content, ensuring high relevance to real-world applications. Additionally, the dataset covers a wide variety of scenes, image resolutions, and image structures, further increasing the difficulty of multi-image understanding. Ultimately, we conduct a comprehensive evaluation of RealBench using 21 multimodal LLMs of different sizes, including closed-source models that support multi-image inputs as well as open-source visual and video models. The experimental results indicate that even the most powerful closed-source models still face challenges when handling multi-image Chinese scenarios. Moreover, there remains a noticeable performance gap of around 71.8\% on average between open-source visual/video models and closed-source models. These results show that RealBench provides an important research foundation for further exploring multi-image understanding capabilities in the Chinese context.
arXiv.org Artificial Intelligence
Sep-23-2025
- Country:
- Asia
- China > Jiangsu Province
- Nanjing (0.04)
- Singapore (0.04)
- China > Jiangsu Province
- Europe
- North America > United States
- California > Los Angeles County
- Long Beach (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Washington > King County
- Seattle (0.04)
- California > Los Angeles County
- Asia
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Information Technology (0.46)
- Technology: