Goto

Collaborating Authors

 strong modality


Balanced Multimodal Learning via Mutual Information

arXiv.org Artificial Intelligence

Multimodal learning aims to integrate complementary signals from diverse data types, yet in practice one modality often dominates training when information content, data quality, or sample size are imbalanced. This modality imbalance suppresses the benefits of integration and is especially problematic in biomedical applications such as multi-omics disease subtyping, where cohorts are small and assays vary in noise and coverage. Foundational syntheses emphasize fusion, alignment, and coordination as core challenges, but principled mechanisms that explicitly counter modality imbalance while preserving useful cross-modal structure remain limited [Baltruห‡ saitis et al., 2018]. We propose a balanced multimodal framework for multi-omics classification that combines three ideas: (i) graph-based encoders that exploit cross-sample structure; (ii) cross-modal knowledge transfer to strengthen weaker modalities; and (iii) a multitask-style optimization procedure that adaptively reweights unimodal and multimodal losses based on performance signals and cross-modal dependence. Concretely, we employ a revised graph convolutional encoder in which node features may derive from a single modality, while edges are constructed from a fused similarity network across modalities. We then pretrain weaker modalities via knowledge distillation from a stronger teacher to transfer predictive structure without overfitting [Hinton et al., 2015, Furlanello et al., 2018]. Finally, we train the joint model with dynamic loss balancing so that no single modality dictates the gradients, leveraging advances in multitask optimization [Chen et al., 2018, Kendall et al., 2018]. 1


Rebalanced Vision-Language Retrieval Considering Structure-Aware Distillation

arXiv.org Artificial Intelligence

Vision-language retrieval aims to search for similar instances in one modality based on queries from another modality. The primary objective is to learn cross-modal matching representations in a latent common space. Actually, the assumption underlying cross-modal matching is modal balance, where each modality contains sufficient information to represent the others. However, noise interference and modality insufficiency often lead to modal imbalance, making it a common phenomenon in practice. The impact of imbalance on retrieval performance remains an open question. In this paper, we first demonstrate that ultimate cross-modal matching is generally sub-optimal for cross-modal retrieval when imbalanced modalities exist. The structure of instances in the common space is inherently influenced when facing imbalanced modalities, posing a challenge to cross-modal similarity measurement. To address this issue, we emphasize the importance of meaningful structure-preserved matching. Accordingly, we propose a simple yet effective method to rebalance cross-modal matching by learning structure-preserved matching representations. Specifically, we design a novel multi-granularity cross-modal matching that incorporates structure-aware distillation alongside the cross-modal matching loss. While the cross-modal matching loss constraints instance-level matching, the structure-aware distillation further regularizes the geometric consistency between learned matching representations and intra-modal representations through the developed relational matching. Extensive experiments on different datasets affirm the superior cross-modal retrieval performance of our approach, simultaneously enhancing single-modal retrieval capabilities compared to the baseline models.


Auxiliary Information Regularized Machine for Multiple Modality Feature Learning

AAAI Conferences

It is notable In real world applications, data are often with multiple that strong modal features can lead to a better performance, modalities. Previous works assumed that each nevertheless, are more expensive, therefore a group of serialized modality contains sufficient information for target feature extraction methods were proposed. These methods and can be treated with equal importance. However, extract weak modal features firstly, and then extract more it is often that different modalities are of various strong modal features gradually to improve the performance importance in real tasks, e.g., the facial feature and reduce the overall cost as well. Marcialis et al.[2010] proposed is weak modality and the fingerprint feature is a serial fusion technique for multiple biometric modal strong modality in ID recognition. In this paper, we features through extracting gaits information and face information point out that different modalities should be treated step by step; Zhang et al.[2014] addressed the serialized with different strategies and propose the Auxiliary multi-modal learning techniques in a semi-supervised information Regularized Machine (ARM), which learning scenario. These methods handle strong and weak works by extracting the most discriminative feature modalities independently while leaving the fact of unsatisfied subspace of weak modality while regularizing the performance on weak modality unexplained.