Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective

Chen, Meiqi, Cao, Yixin, Zhang, Yan, Lu, Chaochao

Apr-3-2024–arXiv.org Artificial Intelligence

Recent advancements in Large Language Models (LLMs) have facilitated the development of Multimodal LLMs (MLLMs). Despite their impressive capabilities, MLLMs often suffer from an over-reliance on unimodal biases (e.g., language bias and vision bias), leading to incorrect answers in complex multimodal tasks. To investigate this issue, we propose a causal framework to interpret the biases in Visual Question Answering (VQA) problems. Within our framework, we devise a causal graph to elucidate the predictions of MLLMs on VQA problems, and assess the causal effect of biases through an in-depth causal analysis. Motivated by the causal graph, we introduce a novel MORE dataset, consisting of 12,000 VQA instances. This dataset is designed to challenge MLLMs' abilities, necessitating multi-hop reasoning and the surmounting of unimodal biases. Furthermore, we propose two strategies to mitigate unimodal biases and enhance MLLMs' reasoning capabilities, including a Decompose-Verify-Answer (DeVA) framework for limited-access MLLMs and the refinement of open-source MLLMs through fine-tuning. Extensive quantitative and qualitative experiments offer valuable insights for future research. Our project page is at https://opencausalab.github.io/MORE.

dataset, information, mllm, (15 more...)

arXiv.org Artificial Intelligence

Apr-3-2024

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Washington > King County
      - Seattle (0.04)
    - Nevada > Clark County
      - Las Vegas (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - California > Los Angeles County
      - Long Beach (0.04)
    - Alaska > Denali Borough
      - Mt Mckinley (0.04)
- Europe
  - France (0.04)
  - Switzerland > Geneva
    - Geneva (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)
- Asia
  - Singapore (0.04)
  - Middle East
    - Qatar (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
  - China > Shanghai
    - Shanghai (0.04)
- Africa
  - South Africa (0.04)
  - Ethiopia > Addis Ababa
    - Addis Ababa (0.04)

Genre:
- Research Report (0.40)

Industry:
- Transportation > Ground (0.96)
- Automobiles & Trucks > Manufacturer (0.73)
- Leisure & Entertainment > Sports
  - Soccer (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found