Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios

Dang, Yunkai, Gao, Mengxi, Yan, Yibo, Zou, Xin, Gu, Yanggan, Liu, Aiwei, Hu, Xuming

Nov-4-2024–arXiv.org Artificial Intelligence

Ensuring that Multimodal Large Language Models (MLLMs) maintain consistency in their responses is essential for developing trustworthy multimodal intelligence. However, existing benchmarks include many samples where all MLLMs exhibit high response uncertainty when encountering misleading information, requiring even 5-15 response attempts per sample to effectively assess uncertainty. Therefore, we propose a two-stage pipeline: first, we collect MLLMs' responses without misleading information, and then gather misleading ones via specific misleading instructions. Eventually, we establish a Multimodal Uncertainty Benchmark (MUB) that employs both explicit and implicit misleading instructions to comprehensively assess the vulnerability of MLLMs across diverse domains. Our experiments reveal that all opensource and close-source MLLMs are highly susceptible to misleading instructions, with an average misleading rate exceeding 86%. To enhance the robustness of MLLMs, we further fine-tune all ...

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Nov-4-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (1.00)

Industry:
- Health & Medicine > Therapeutic Area (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.95)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning (0.92)