Quantifying Modality Contributions via Disentangling Multimodal Representations

Amit, Padegal, Kashyap, Omkar Mahesh, Rayasam, Namitha, Shekhar, Nidhi, Narayan, Surabhi

Nov-26-2025–arXiv.org Artificial Intelligence

Quantifying modality contributions in multimodal models remains a challenge, as existing approaches conflate the notion of contribution itself. Prior work relies on accuracy-based approaches, interpreting performance drops after removing a modality as indicative of its influence. However, such outcome-driven metrics fail to distinguish whether a modality is inherently informative or whether its value arises only through interaction with other modalities. This distinction is particularly important in cross-attention architectures, where modalities influence each other's representations. In this work, we propose a framework based on Partial Information Decomposition (PID) that quantifies modality contributions by decomposing predictive information in internal embeddings into unique, redundant, and synergistic components. To enable scalable, inference-only analysis, we develop an algorithm based on the Iterative Proportional Fitting Procedure (IPFP) that computes layer and dataset-level contributions without retraining. This provides a principled, representation-level view of multimodal behavior, offering clearer and more interpretable insights than outcome-based metrics.

contribution, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Nov-26-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.67)
- North America (0.46)

Genre:
- Research Report (0.82)

Technology:
- Information Technology
  - Information Management (0.67)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Vision (0.94)
    - Machine Learning > Statistical Learning (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found