Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models

Kibria, Md Raisul, Lafond, Sébastien, Arslan, Janan

Aug-7-2025–arXiv.org Artificial Intelligence

Multimodal learning has witnessed remarkable advancements in recent years, particularly with the integration of attention-based models, leading to significant performance gains across a variety of tasks. Parallel to this progress, the demand for explainable artificial intelligence (XAI) has spurred a growing body of research aimed at interpreting the complex decision-making processes of these models. This systematic literature review analyzes research published between January 2020 and early 2024 that focuses on the explainability of multimodal models. Framed within the broader goals of XAI, we examine the literature across multiple dimensions, including model architecture, modalities involved, explanation algorithms and evaluation methodologies. Our analysis reveals that the majority of studies are concentrated on vision-language and language-only models, with attention-based techniques being the most commonly employed for explanation. However, these methods often fall short in capturing the full spectrum of interactions between modalities, a challenge further compounded by the architectural heterogeneity across domains. Importantly, we find that evaluation methods for XAI in multimodal settings are largely non-systematic, lacking consistency, robustness, and consideration for modality-specific cognitive and contextual factors. Based on these findings, we provide a comprehensive set of recommendations aimed at promoting rigorous, transparent, and standardized evaluation and reporting practices in multimodal XAI research. Our goal is to support future research in more interpretable, accountable, and responsible mulitmodal AI systems, with explainability at their core.

explanation, large language model, machine learning, (24 more...)

arXiv.org Artificial Intelligence

Aug-7-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.92)
- Europe > Spain (0.28)
- North America > Canada (0.27)

Genre:
- Overview (1.00)
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Information Technology (0.68)
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Health Care Technology (1.00)
  - Therapeutic Area (0.93)
  - Diagnostic Medicine > Imaging (0.92)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Data Science > Data Mining (1.00)
  - Communications > Social Media (0.93)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Cognitive Science (1.00)
    - Issues > Social & Ethical Issues (0.86)
    - Natural Language
      - Large Language Model (1.00)
      - Explanation & Argumentation (1.00)
      - Text Processing (0.93)
    - Machine Learning
      - Statistical Learning (1.00)
      - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found