Efficient Quantification of Multimodal Interaction at Sample Level
Yang, Zequn, Wang, Hongfa, Hu, Di
Interactions between modalities -- redundancy, uniqueness, and synergy -- collectively determine the composition of multimodal information. Understanding these interactions is crucial for analyzing information dynamics in multimodal systems, yet their accurate sample-level quantification presents significant theoretical and computational challenges. To address this, we introduce the Lightweight Sample-wise Multimodal Interaction (LSMI) estimator, rigorously grounded in pointwise information theory. We first develop a redundancy estimation framework, employing an appropriate pointwise information measure to quantify this most decomposable and measurable interaction. Building upon this, we propose a general interaction estimation method that employs efficient entropy estimation, specifically tailored for sample-wise estimation in continuous distributions. Extensive experiments on synthetic and real-world datasets validate LSMI's precision and efficiency. Crucially, our sample-wise approach reveals fine-grained sample- and category-level dynamics within multimodal data, enabling practical applications such as redundancy-informed sample partitioning, targeted knowledge distillation, and interaction-aware model ensembling. The code is available at https://github.com/GeWu-Lab/LSMI_Estimator.
Jun-24-2025
- Country:
- Asia > China
- Europe > Switzerland
- North America > Canada (0.04)
- Genre:
- Research Report > New Finding (1.00)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning (1.00)
- Natural Language (1.00)
- Representation & Reasoning (0.66)
- Vision (0.68)
- Information Management (0.87)
- Artificial Intelligence
- Information Technology