Balanced Multimodal Learning via Mutual Information
Xie, Rongrong, Sanguinetti, Guido
–arXiv.org Artificial Intelligence
Multimodal learning aims to integrate complementary signals from diverse data types, yet in practice one modality often dominates training when information content, data quality, or sample size are imbalanced. This modality imbalance suppresses the benefits of integration and is especially problematic in biomedical applications such as multi-omics disease subtyping, where cohorts are small and assays vary in noise and coverage. Foundational syntheses emphasize fusion, alignment, and coordination as core challenges, but principled mechanisms that explicitly counter modality imbalance while preserving useful cross-modal structure remain limited [Baltruˇ saitis et al., 2018]. We propose a balanced multimodal framework for multi-omics classification that combines three ideas: (i) graph-based encoders that exploit cross-sample structure; (ii) cross-modal knowledge transfer to strengthen weaker modalities; and (iii) a multitask-style optimization procedure that adaptively reweights unimodal and multimodal losses based on performance signals and cross-modal dependence. Concretely, we employ a revised graph convolutional encoder in which node features may derive from a single modality, while edges are constructed from a fused similarity network across modalities. We then pretrain weaker modalities via knowledge distillation from a stronger teacher to transfer predictive structure without overfitting [Hinton et al., 2015, Furlanello et al., 2018]. Finally, we train the joint model with dynamic loss balancing so that no single modality dictates the gradients, leveraging advances in multitask optimization [Chen et al., 2018, Kendall et al., 2018]. 1
arXiv.org Artificial Intelligence
Nov-4-2025