Information Fusion
Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes
Waqas, Asim, Tripathi, Aakash, Stewart, Paul, Naeini, Mia, Rasool, Ghulam
Cancer clinics capture disease data at various scales, from genetic to organ level. Current bioinformatic methods struggle to handle the heterogeneous nature of this data, especially with missing modalities. We propose PARADIGM, a Graph Neural Network (GNN) framework that learns from multimodal, heterogeneous datasets to improve clinical outcome prediction. PARADIGM generates embeddings from multi-resolution data using foundation models, aggregates them into patient-level representations, fuses them into a unified graph, and enhances performance for tasks like survival analysis. We train GNNs on pan-Squamous Cell Carcinomas and validate our approach on Moffitt Cancer Center lung SCC data. Multimodal GNN outperforms other models in patient survival prediction. Converging individual data modalities across varying scales provides a more insightful disease view. Our solution aims to understand the patient's circumstances comprehensively, offering insights on heterogeneous data integration and the benefits of converging maximum data views.
SeeFar: Satellite Agnostic Multi-Resolution Dataset for Geospatial Foundation Models
Lowman, James, Zheng, Kelly Liu, Fraser, Roydon, The, Jesse Van Griensven, Valipour, Mojtaba
SeeFar is an evolving collection of multi-resolution satellite images from public and commercial satellites. We specifically curated this dataset for training geospatial foundation models, unconstrained by satellite type. In recent years, advances in technology have made satellite imagery more accessible than ever. More earth-observing satellites have been launched in the last five years than in the previous fifty. Modern commercial satellites now offer up to 100 times the spatial resolution of public access satellites. However, the high cost and limited historical availability of commercial satellite imagery is a barrier to the training of foundational models, impacting what images can be used during inference. The SeeFar dataset represents a step towards training models that are satellite-agnostic by combining multi-resolution commercial and public access pre-processed images. This will enable users to utilize historical data alongside higher-resolution, more expensive satellite imagery, offering greater flexibility during inference. To achieve this, we describe a process for standardizing data from diverse satellite sources, normalizing different data formats, and aligning spectral bands to enhance interoperability. The SeeFar dataset includes images at a resolution of 384x384 pixels, spanning four spectral bands (Blue, Green, Red, and Near-Infrared) and expanding spatial resolutions (starting with 30, 10, 1.5, and 1.0 meters), all in cloud-optimized GeoTIFF format. It also provides consistent and comprehensive metadata to enhance data transparency and reliability. By aggregating data from multiple sources, SeeFar makes processed and consistent satellite data accessible to a wider range of users - from researchers to policymakers - fostering competition and innovation in satellite imagery analysis. The dataset is available at \url{coastalcarbon.ai/seefar}.
Stabilized Adaptive Steering for 3D Sonar Microphone Arrays with IMU Sensor Fusion
Jansen, Wouter, Laurijssen, Dennis, Steckel, Jan
This paper presents a novel software-based approach to stabilizing the acoustic images for in-air 3D sonars. Due to uneven terrain, traditional static beamforming techniques can be misaligned, causing inaccurate measurements and imaging artifacts. Furthermore, mechanical stabilization can be more costly and prone to failure. We propose using an adaptive conventional beamforming approach by fusing it with real-time IMU data to adjust the sonar array's steering matrix dynamically based on the elevation tilt angle caused by the uneven ground. Additionally, we propose gaining compensation to offset emission energy loss due to the transducer's directivity pattern and validate our approach through various experiments, which show significant improvements in temporal consistency in the acoustic images. We implemented a GPU-accelerated software system that operates in real-time with an average execution time of 210ms, meeting autonomous navigation requirements.
FusionINN: Decomposable Image Fusion for Brain Tumor Monitoring
Kumar, Nishant, Tao, Ziyan, Singh, Jaikirat, Li, Yang, Sun, Peiwen, Zhao, Binghui, Gumhold, Stefan
Image fusion typically employs non-invertible neural networks to merge multiple source images into a single fused image. However, for clinical experts, solely relying on fused images may be insufficient for making diagnostic decisions, as the fusion mechanism blends features from source images, thereby making it difficult to interpret the underlying tumor pathology. We introduce FusionINN, a novel decomposable image fusion framework, capable of efficiently generating fused images and also decomposing them back to the source images. FusionINN is designed to be bijective by including a latent image alongside the fused image, while ensuring minimal transfer of information from the source images to the latent representation. To the best of our knowledge, we are the first to investigate the decomposability of fused images, which is particularly crucial for life-sensitive applications such as medical image fusion compared to other tasks like multi-focus or multi-exposure image fusion. Our extensive experimentation validates FusionINN over existing discriminative and generative fusion methods, both subjectively and objectively. Moreover, compared to a recent denoising diffusion-based fusion model, our approach offers faster and qualitatively better fusion results.
Domain Agnostic Conditional Invariant Predictions for Domain Generalization
Wang, Zongbin, Pan, Bin, Shi, Zhenwei
Domain generalization aims to develop a model that can perform well on unseen target domains by learning from multiple source domains. However, recent-proposed domain generalization models usually rely on domain labels, which may not be available in many real-world scenarios. To address this challenge, we propose a Discriminant Risk Minimization (DRM) theory and the corresponding algorithm to capture the invariant features without domain labels. In DRM theory, we prove that reducing the discrepancy of prediction distribution between overall source domain and any subset of it can contribute to obtaining invariant features. To apply the DRM theory, we develop an algorithm which is composed of Bayesian inference and a new penalty termed as Categorical Discriminant Risk (CDR). In Bayesian inference, we transform the output of the model into a probability distribution to align with our theoretical assumptions. We adopt sliding update approach to approximate the overall prediction distribution of the model, which enables us to obtain CDR penalty. We also indicate the effectiveness of these components in finding invariant features. We evaluate our algorithm against various domain generalization methods on multiple real-world datasets, providing empirical support for our theory.
Advancing Histopathology-Based Breast Cancer Diagnosis: Insights into Multi-Modality and Explainability
Abdullakutty, Faseela, Akbari, Younes, Al-Maadeed, Somaya, Bouridane, Ahmed, Hamoudi, Rifat
As a leading cause of mortality among women globally, the precise and timely diagnosis of breast cancer remains imperative for optimizing patient outcomes. While traditional diagnostic methodologies [2] have historically relied heavily on uni-modal approaches, the evolving landscape of medical data analytics underscores the significance of integrating diverse data sources beyond conventional imaging modalities [3]. Figure 1 illustrates a generic model for breast cancer diagnosis within the Computer-Aided Detection (CAD) framework. As depicted in Figure 2, breast cancer detection can be performed using various data types, employing either unimodal or multimodal approaches. The process initiates with data pre-processing, followed by feature extraction. To enhance the learning of feature representations from image data, segmentation may be conducted prior to feature extraction. Subsequently, the detection model is applied to generate a diagnosis from the processed data. Based on this diagnosis, further analyses are performed, including sub-type classification, grade classification, recurrence and metastasis prediction, as well as the incorporation of crowdsourcing and human-in-the-loop methodologies. These steps culminate in a final decision that informs subsequent treatment and monitoring strategies.
Stock Movement Prediction with Multimodal Stable Fusion via Gated Cross-Attention Mechanism
Zong, Chang, Shao, Jian, Lu, Weiming, Zhuang, Yueting
The accurate prediction of stock movements is crucial for investment strategies. Stock prices are subject to the influence of various forms of information, including financial indicators, sentiment analysis, news documents, and relational structures. Predominant analytical approaches, however, tend to address only unimodal or bimodal sources, neglecting the complexity of multimodal data. Further complicating the landscape are the issues of data sparsity and semantic conflicts between these modalities, which are frequently overlooked by current models, leading to unstable performance and limiting practical applicability. To address these shortcomings, this study introduces a novel architecture, named Multimodal Stable Fusion with Gated Cross-Attention (MSGCA), designed to robustly integrate multimodal input for stock movement prediction. The MSGCA framework consists of three integral components: (1) a trimodal encoding module, responsible for processing indicator sequences, dynamic documents, and a relational graph, and standardizing their feature representations; (2) a cross-feature fusion module, where primary and consistent features guide the multimodal fusion of the three modalities via a pair of gated cross-attention networks; and (3) a prediction module, which refines the fused features through temporal and dimensional reduction to execute precise movement forecasting. Empirical evaluations demonstrate that the MSGCA framework exceeds current leading methods, achieving performance gains of 8.1%, 6.1%, 21.7% and 31.6% on four multimodal datasets, respectively, attributed to its enhanced multimodal fusion stability.
FLOW: Fusing and Shuffling Global and Local Views for Cross-User Human Activity Recognition with IMUs
Qiu, Qi, Zhu, Tao, Duan, Furong, Wang, Kevin I-Kai, Chen, Liming, Nie, Mingxing, Nie, Mingxing
Inertial Measurement Unit (IMU) sensors are widely employed for Human Activity Recognition (HAR) due to their portability, energy efficiency, and growing research interest. However, a significant challenge for IMU-HAR models is achieving robust generalization performance across diverse users. This limitation stems from substantial variations in data distribution among individual users. One primary reason for this distribution disparity lies in the representation of IMU sensor data in the local coordinate system, which is susceptible to subtle user variations during IMU wearing. To address this issue, we propose a novel approach that extracts a global view representation based on the characteristics of IMU data, effectively alleviating the data distribution discrepancies induced by wearing styles. To validate the efficacy of the global view representation, we fed both global and local view data into model for experiments. The results demonstrate that global view data significantly outperforms local view data in cross-user experiments. Furthermore, we propose a Multi-view Supervised Network (MVFNet) based on Shuffling to effectively fuse local view and global view data. It supervises the feature extraction of each view through view division and view shuffling, so as to avoid the model ignoring important features as much as possible. Extensive experiments conducted on OPPORTUNITY and PAMAP2 datasets demonstrate that the proposed algorithm outperforms the current state-of-the-art methods in cross-user HAR.
DF-DM: A foundational process model for multimodal data fusion in the artificial intelligence era
Restrepo, David, Wu, Chenwei, Vásquez-Venegas, Constanza, Nakayama, Luis Filipe, Celi, Leo Anthony, López, Diego M
In the big data era, integrating diverse data modalities poses significant challenges, particularly in complex fields like healthcare. This paper introduces a new process model for multimodal Data Fusion for Data Mining, integrating embeddings and the Cross-Industry Standard Process for Data Mining with the existing Data Fusion Information Group model. Our model aims to decrease computational costs, complexity, and bias while improving efficiency and reliability. We also propose "disentangled dense fusion", a novel embedding fusion method designed to optimize mutual information and facilitate dense inter-modality feature interaction, thereby minimizing redundant information. We demonstrate the model's efficacy through three use cases: predicting diabetic retinopathy using retinal images and patient metadata, domestic violence prediction employing satellite imagery, internet, and census data, and identifying clinical and demographic features from radiography images and clinical notes. The model achieved a Macro F1 score of 0.92 in diabetic retinopathy prediction, an R-squared of 0.854 and sMAPE of 24.868 in domestic violence prediction, and a macro AUC of 0.92 and 0.99 for disease prediction and sex classification, respectively, in radiological analysis. These results underscore the Data Fusion for Data Mining model's potential to significantly impact multimodal data processing, promoting its adoption in diverse, resource-constrained settings.
Cross-Training with Multi-View Knowledge Fusion for Heterogenous Federated Learning
Qi, Zhuang, Meng, Lei, He, Weihao, Zhang, Ruohan, Wang, Yu, Qi, Xin, Meng, Xiangxu
Federated learning benefits from cross-training strategies, which enables models to train on data from distinct sources to improve the generalization capability. However, the data heterogeneity between sources may lead models to gradually forget previously acquired knowledge when undergoing cross-training to adapt to new tasks or data sources. We argue that integrating personalized and global knowledge to gather information from multiple perspectives could potentially improve performance. To achieve this goal, this paper presents a novel approach that enhances federated learning through a cross-training scheme incorporating multi-view information. Specifically, the proposed method, termed FedCT, includes three main modules, where the consistency-aware knowledge broadcasting module aims to optimize model assignment strategies, which enhances collaborative advantages between clients and achieves an efficient federated learning process. The multi-view knowledge-guided representation learning module leverages fused prototypical knowledge from both global and local views to enhance the preservation of local knowledge before and after model exchange, as well as to ensure consistency between local and global knowledge. The mixup-based feature augmentation module aggregates rich information to further increase the diversity of feature spaces, which enables the model to better discriminate complex samples. Extensive experiments were conducted on four datasets in terms of performance comparison, ablation study, in-depth analysis and case study. The results demonstrated that FedCT alleviates knowledge forgetting from both local and global views, which enables it outperform state-of-the-art methods.