Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models
Kibria, Md Raisul, Lafond, Sébastien, Arslan, Janan
–arXiv.org Artificial Intelligence
Multimodal learning has witnessed remarkable advancements in recent years, particularly with the integration of attention-based models, leading to significant performance gains across a variety of tasks. Parallel to this progress, the demand for explainable artificial intelligence (XAI) has spurred a growing body of research aimed at interpreting the complex decision-making processes of these models. This systematic literature review analyzes research published between January 2020 and early 2024 that focuses on the explainability of multimodal models. Framed within the broader goals of XAI, we examine the literature across multiple dimensions, including model architecture, modalities involved, explanation algorithms and evaluation methodologies. Our analysis reveals that the majority of studies are concentrated on vision-language and language-only models, with attention-based techniques being the most commonly employed for explanation. However, these methods often fall short in capturing the full spectrum of interactions between modalities, a challenge further compounded by the architectural heterogeneity across domains. Importantly, we find that evaluation methods for XAI in multimodal settings are largely non-systematic, lacking consistency, robustness, and consideration for modality-specific cognitive and contextual factors. Based on these findings, we provide a comprehensive set of recommendations aimed at promoting rigorous, transparent, and standardized evaluation and reporting practices in multimodal XAI research. Our goal is to support future research in more interpretable, accountable, and responsible mulitmodal AI systems, with explainability at their core.
arXiv.org Artificial Intelligence
Aug-7-2025
- Country:
- Asia
- China > Hainan Province
- Haikou (0.04)
- Singapore > Central Region
- Singapore (0.04)
- Taiwan (0.04)
- China > Hainan Province
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- France (0.04)
- United Kingdom > England
- Staffordshire > Keele (0.04)
- Finland > Southwest Finland
- Turku (0.04)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- Spain
- Andalusia > Granada Province
- Granada (0.04)
- Galicia > Madrid (0.04)
- Andalusia > Granada Province
- Italy > Tuscany
- Florence (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Switzerland > Zürich
- Zürich (0.14)
- Belgium > Brussels-Capital Region
- North America
- Canada
- Dominican Republic (0.04)
- United States (0.04)
- Oceania > Australia
- New South Wales > Sydney (0.04)
- South America > Colombia
- Meta Department > Villavicencio (0.04)
- Asia
- Genre:
- Overview (1.00)
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Industry:
- Health & Medicine
- Diagnostic Medicine > Imaging (0.92)
- Health Care Technology (1.00)
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area (0.93)
- Information Technology (0.68)
- Health & Medicine
- Technology:
- Information Technology
- Artificial Intelligence
- Cognitive Science (1.00)
- Issues > Social & Ethical Issues (0.86)
- Machine Learning
- Neural Networks > Deep Learning (1.00)
- Statistical Learning (1.00)
- Natural Language
- Explanation & Argumentation (1.00)
- Large Language Model (1.00)
- Text Processing (0.93)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Communications > Social Media (0.93)
- Data Science > Data Mining (1.00)
- Sensing and Signal Processing > Image Processing (1.00)
- Artificial Intelligence
- Information Technology