Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions
Gao, Jin, Gan, Lei, Li, Yuankai, Ye, Yixin, Wang, Dequan
–arXiv.org Artificial Intelligence
Large multimodal models (LMMs) excel in adhering to human instructions. However, self-contradictory instructions may arise due to the increasing trend of multimodal interaction and context length, which is challenging for language beginners and vulnerable populations. We introduce the Self-Contradictory Instructions benchmark to evaluate the capability of LMMs in recognizing conflicting commands. It comprises 20,000 conflicts, evenly distributed between language and vision paradigms. It is constructed by a novel automatic dataset creation framework, which expedites the process and enables us to encompass a wide range of instruction forms. Our comprehensive evaluation reveals current LMMs consistently struggle to identify multimodal instruction discordance due to a lack of self-awareness. Hence, we propose the Cognitive Awakening Prompting to inject cognition from external, largely enhancing dissonance detection. Here are our website, dataset, and code.
arXiv.org Artificial Intelligence
Aug-5-2024
- Country:
- Asia
- China
- Shandong Province (0.04)
- Shanghai > Shanghai (0.04)
- Middle East > Republic of Türkiye (0.04)
- China
- Europe
- Germany (0.04)
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- North America > Cuba (0.04)
- Asia
- Genre:
- Research Report (0.64)
- Industry:
- Government (0.48)
- Information Technology (0.69)
- Technology: