Quantifying Modality Contributions via Disentangling Multimodal Representations
Amit, Padegal, Kashyap, Omkar Mahesh, Rayasam, Namitha, Shekhar, Nidhi, Narayan, Surabhi
–arXiv.org Artificial Intelligence
Quantifying modality contributions in multimodal models remains a challenge, as existing approaches conflate the notion of contribution itself. Prior work relies on accuracy-based approaches, interpreting performance drops after removing a modality as indicative of its influence. However, such outcome-driven metrics fail to distinguish whether a modality is inherently informative or whether its value arises only through interaction with other modalities. This distinction is particularly important in cross-attention architectures, where modalities influence each other's representations. In this work, we propose a framework based on Partial Information Decomposition (PID) that quantifies modality contributions by decomposing predictive information in internal embeddings into unique, redundant, and synergistic components. To enable scalable, inference-only analysis, we develop an algorithm based on the Iterative Proportional Fitting Procedure (IPFP) that computes layer and dataset-level contributions without retraining. This provides a principled, representation-level view of multimodal behavior, offering clearer and more interpretable insights than outcome-based metrics.
arXiv.org Artificial Intelligence
Nov-26-2025
- Country:
- Asia > China
- Hong Kong (0.04)
- Europe
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Tuscany
- Florence (0.04)
- Sweden > Vaestra Goetaland
- Gothenburg (0.04)
- Ireland > Leinster
- North America
- Canada (0.04)
- Dominican Republic (0.04)
- United States > Louisiana
- Orleans Parish > New Orleans (0.04)
- Asia > China
- Genre:
- Research Report (0.82)
- Technology: