Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions

Gao, Jin, Gan, Lei, Li, Yuankai, Ye, Yixin, Wang, Dequan

Aug-5-2024–arXiv.org Artificial Intelligence

Large multimodal models (LMMs) excel in adhering to human instructions. However, self-contradictory instructions may arise due to the increasing trend of multimodal interaction and context length, which is challenging for language beginners and vulnerable populations. We introduce the Self-Contradictory Instructions benchmark to evaluate the capability of LMMs in recognizing conflicting commands. It comprises 20,000 conflicts, evenly distributed between language and vision paradigms. It is constructed by a novel automatic dataset creation framework, which expedites the process and enables us to encompass a wide range of instruction forms. Our comprehensive evaluation reveals current LMMs consistently struggle to identify multimodal instruction discordance due to a lack of self-awareness. Hence, we propose the Cognitive Awakening Prompting to inject cognition from external, largely enhancing dissonance detection. Here are our website, dataset, and code.

arxiv preprint arxiv, conflict, instruction, (13 more...)

arXiv.org Artificial Intelligence

Aug-5-2024

arXiv.org PDF

Add feedback

Country:
- North America > Cuba (0.04)
- Europe
  - Germany (0.04)
  - Italy > Calabria
    - Catanzaro Province > Catanzaro (0.04)
- Asia
  - Middle East > Republic of Türkiye (0.04)
  - China
    - Shanghai > Shanghai (0.04)
    - Shandong Province (0.04)

Genre:
- Research Report (0.64)

Industry:
- Information Technology (0.69)
- Government (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)