II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models 1,3 Xi Feng
–Neural Information Processing Systems
The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap, we propose the Image Implication understanding Benchmark, II-Bench, which aims to evaluate the model's higher-order perception of images. Through extensive experiments on II-Bench across multiple MLLMs, we have made significant findings. Initially, a substantial gap is observed between the performance of MLLMs and humans on II-Bench.
Neural Information Processing Systems
May-29-2025, 13:22:08 GMT
- Country:
- Asia > China (0.28)
- North America > United States
- New York (0.14)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Consumer Products & Services (0.67)
- Energy (0.92)
- Health & Medicine
- Consumer Health (0.67)
- Therapeutic Area > Psychiatry/Psychology
- Mental Health (0.46)
- Information Technology > Security & Privacy (1.00)
- Law (1.00)
- Leisure & Entertainment (1.00)
- Social Sector (0.92)
- Transportation (0.67)
- Technology: