Goto

Collaborating Authors

 symbolize





II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

Liu, Ziqiang, Fang, Feiteng, Feng, Xi, Du, Xinrun, Zhang, Chenhao, Wang, Zekun, Bai, Yuelin, Zhao, Qixuan, Fan, Liyang, Gan, Chengguang, Lin, Hongquan, Li, Jiaming, Ni, Yuansheng, Wu, Haihong, Narsupalli, Yaswanth, Zheng, Zhigang, Li, Chengming, Hu, Xiping, Xu, Ruifeng, Chen, Xiaojun, Yang, Min, Liu, Jiaheng, Liu, Ruibo, Huang, Wenhao, Zhang, Ge, Ni, Shiwen

arXiv.org Artificial Intelligence

The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap, we propose the Image Implication understanding Benchmark, II-Bench, which aims to evaluate the model's higher-order perception of images. Through extensive experiments on II-Bench across multiple MLLMs, we have made significant findings. Initially, a substantial gap is observed between the performance of MLLMs and humans on II-Bench. The pinnacle accuracy of MLLMs attains 74.8%, whereas human accuracy averages 90%, peaking at an impressive 98%. Subsequently, MLLMs perform worse on abstract and complex images, suggesting limitations in their ability to understand high-level semantics and capture image details. Finally, it is observed that most models exhibit enhanced accuracy when image sentiment polarity hints are incorporated into the prompts. This observation underscores a notable deficiency in their inherent understanding of image sentiment. We believe that II-Bench will inspire the community to develop the next generation of MLLMs, advancing the journey towards expert artificial general intelligence (AGI). II-Bench is publicly available at https://huggingface.co/datasets/m-a-p/II-Bench.


One Future at a Time

#artificialintelligence

You may have noticed our new website and a fresh, new look and feel to our brand. Our marketing team made these changes in order to more accurately reflect our vision and the work Lumiata is doing every day bring that vision to life.Our logo symbolizes an element: science and rigor in our work. Our graphics symbolize the possibilities of a future of healthcare that is powered by artificial intelligence, where all actors are seamlessly connected and empowered to collaborate towards Predictive Care. We hope you find the changes clarifying, useful and engaging. Visit often, as we will be providing opportunities to learn more about AI in healthcare, advances payer and provider organizations are making through predictive analytics, and opportunities to engage with your colleagues in healthcare who share your interests and passion.