Goto

Collaborating Authors

 Information Fusion


Federated Continual Learning: Concepts, Challenges, and Solutions

arXiv.org Artificial Intelligence

Federated Continual Learning (FCL) has emerged as a robust solution for collaborative model training in dynamic environments, where data samples are continuously generated and distributed across multiple devices. This survey provides a comprehensive review of FCL, focusing on key challenges such as heterogeneity, model stability, communication overhead, and privacy preservation. We explore various forms of heterogeneity and their impact on model performance. Solutions to non-IID data, resource-constrained platforms, and personalized learning are reviewed in an effort to show the complexities of handling heterogeneous data distributions. Next, we review techniques for ensuring model stability and avoiding catastrophic forgetting, which are critical in non-stationary environments. Privacy-preserving techniques are another aspect of FCL that have been reviewed in this work. This survey has integrated insights from federated learning and continual learning to present strategies for improving the efficacy and scalability of FCL systems, making it applicable to a wide range of real-world scenarios.


Localized Data Fusion for Kernel k-Means Clustering with Application to Cancer Biology

Neural Information Processing Systems

In many modern applications from, for example, bioinformatics and computer vision, samples have multiple feature representations coming from different data sources. Multiview learning algorithms try to exploit all these available information to obtain a better learner in such scenarios. In this paper, we propose a novel multiple kernel learning algorithm that extends kernel k-means clustering to the multiview setting, which combines kernels calculated on the views in a localized way to better capture sample-specific characteristics of the data. We demonstrate the better performance of our localized data fusion approach on a human colon and rectal cancer data set by clustering patients. Our method finds more relevant prognostic patient groups than global data fusion methods when we evaluate the results with respect to three commonly used clinical biomarkers.


Multi-modal Data Fusion and Deep Ensemble Learning for Accurate Crop Yield Prediction

arXiv.org Artificial Intelligence

This study introduces RicEns-Net, a novel Deep Ensemble model designed to predict crop yields by integrating diverse data sources through multimodal data fusion techniques. The research focuses specifically on the use of synthetic aperture radar (SAR), optical remote sensing data from Sentinel 1, 2, and 3 satellites, and meteorological measurements such as surface temperature and rainfall. The initial field data for the study were acquired through Ernst & Young's (EY) Open Science Challenge 2023. The primary objective is to enhance the precision of crop yield prediction by developing a machine-learning framework capable of handling complex environmental data. A comprehensive data engineering process was employed to select the most informative features from over 100 potential predictors, reducing the set to 15 features from 5 distinct modalities. This step mitigates the ``curse of dimensionality" and enhances model performance. The RicEns-Net architecture combines multiple machine learning algorithms in a deep ensemble framework, integrating the strengths of each technique to improve predictive accuracy. Experimental results demonstrate that RicEns-Net achieves a mean absolute error (MAE) of 341 kg/Ha (roughly corresponds to 5-6\% of the lowest average yield in the region), significantly exceeding the performance of previous state-of-the-art models, including those developed during the EY challenge.


Review for NeurIPS paper: BayReL: Bayesian Relational Learning for Multi-omics Data Integration

Neural Information Processing Systems

Summary and Contributions: In this paper, the authors propose a Bayesian representation learning framework that can infer links between heterogeneous graphs generated from multi-omics datasets. The main idea is to use the underlying relationship information within each dataset (or view) by modeling it as a graph. The method has 4 steps - (1) to embed the nodes of each view-specific graph into in the same latent space (2) generate a multi-view adjacency tensor using the similarity scores for node embeddings across views (3) Infer prior latent variables from the node embeddings and multi-view graphs and posterior from the view-specific data (4) Finally, perform variational inference to optimize model parameters and variational parameters. The paper attempts to solve an important problem of multi-omics data integration by learning relationships that can exist between different modalities by modeling them as multi-view link prediction. This work could be useful to the broader ML community.


Review for NeurIPS paper: BayReL: Bayesian Relational Learning for Multi-omics Data Integration

Neural Information Processing Systems

The paper proposes a Bayesian formulation for the integration of multi omics datasets by combining within-view and between-view interactions. Although the paper is conceptually related to prior work, the reviewers appreciate the contributions made, which are both timely and relevant to the neurips community. Overall, this is a solid submission and the authors defend the concerns raised convincingly in their rebuttal.


CAST: Cross Attention based multimodal fusion of Structure and Text for materials property prediction

arXiv.org Artificial Intelligence

Recent advancements in AI have revolutionized property prediction in materials science and accelerating material discovery. Graph neural networks (GNNs) stand out due to their ability to represent crystal structures as graphs, effectively capturing local interactions and delivering superior predictions. However, these methods often lose critical global information, such as crystal systems and repetitive unit connectivity. To address this, we propose CAST, a cross-attention-based multimodal fusion model that integrates graph and text modalities to preserve essential material information. CAST combines node- and token-level features using cross-attention mechanisms, surpassing previous approaches reliant on material-level embeddings like graph mean-pooling or [CLS] tokens. A masked node prediction pretraining strategy further enhances atomic-level information integration. Our method achieved up to 22.9\% improvement in property prediction across four crystal properties including band gap compared to methods like CrysMMNet and MultiMat. Pretraining was key to aligning node and text embeddings, with attention maps confirming its effectiveness in capturing relationships between nodes and tokens. This study highlights the potential of multimodal learning in materials science, paving the way for more robust predictive models that incorporate both local and global information.


MapFusion: A Novel BEV Feature Fusion Network for Multi-modal Map Construction

arXiv.org Artificial Intelligence

Map construction task plays a vital role in providing precise and comprehensive static environmental information essential for autonomous driving systems. Primary sensors include cameras and LiDAR, with configurations varying between camera-only, LiDAR-only, or camera-LiDAR fusion, based on cost-performance considerations. While fusion-based methods typically perform best, existing approaches often neglect modality interaction and rely on simple fusion strategies, which suffer from the problems of misalignment and information loss. To address these issues, we propose MapFusion, a novel multi-modal Bird's-Eye View (BEV) feature fusion method for map construction. Specifically, to solve the semantic misalignment problem between camera and LiDAR BEV features, we introduce the Cross-modal Interaction Transform (CIT) module, enabling interaction between two BEV feature spaces and enhancing feature representation through a self-attention mechanism. Additionally, we propose an effective Dual Dynamic Fusion (DDF) module to adaptively select valuable information from different modalities, which can take full advantage of the inherent information between different modalities. Moreover, MapFusion is designed to be simple and plug-and-play, easily integrated into existing pipelines. We evaluate MapFusion on two map construction tasks, including High-definition (HD) map and BEV map segmentation, to show its versatility and effectiveness. Compared with the state-of-the-art methods, MapFusion achieves 3.6% and 6.2% absolute improvements on the HD map construction and BEV map segmentation tasks on the nuScenes dataset, respectively, demonstrating the superiority of our approach.


Event-aided Semantic Scene Completion

arXiv.org Artificial Intelligence

Autonomous driving systems rely on robust 3D scene understanding. Recent advances in Semantic Scene Completion (SSC) for autonomous driving underscore the limitations of RGB-based approaches, which struggle under motion blur, poor lighting, and adverse weather. Event cameras, offering high dynamic range and low latency, address these challenges by providing asynchronous data that complements RGB inputs. We present DSEC-SSC, the first real-world benchmark specifically designed for event-aided SSC, which includes a novel 4D labeling pipeline for generating dense, visibility-aware labels that adapt dynamically to object motion. Our proposed RGB-Event fusion framework, EvSSC, introduces an Event-aided Lifting Module (ELM) that effectively bridges 2D RGB-Event features to 3D space, enhancing view transformation and the robustness of 3D volume construction across SSC models. Extensive experiments on DSEC-SSC and simulated SemanticKITTI-E demonstrate that EvSSC is adaptable to both transformer-based and LSS-based SSC architectures. Notably, evaluations on SemanticKITTI-C demonstrate that EvSSC achieves consistently improved prediction accuracy across five degradation modes and both In-domain and Out-of-domain settings, achieving up to a 52.5% relative improvement in mIoU when the image sensor partially fails. Additionally, we quantitatively and qualitatively validate the superiority of EvSSC under motion blur and extreme weather conditions, where autonomous driving is challenged. The established datasets and our codebase will be made publicly at https://github.com/Pandapan01/EvSSC.


Concept-Aware Latent and Explicit Knowledge Integration for Enhanced Cognitive Diagnosis

arXiv.org Artificial Intelligence

Cognitive diagnosis can infer the students' mastery of specific knowledge concepts based on historical response logs. However, the existing cognitive diagnostic models (CDMs) represent students' proficiency via a unidimensional perspective, which can't assess the students' mastery on each knowledge concept comprehensively. Moreover, the Q-matrix binarizes the relationship between exercises and knowledge concepts, and it can't represent the latent relationship between exercises and knowledge concepts. Especially, when the granularity of knowledge attributes refines increasingly, the Q-matrix becomes incomplete correspondingly and the sparse binary representation (0/1) fails to capture the intricate relationships among knowledge concepts. To address these issues, we propose a Concept-aware Latent and Explicit Knowledge Integration model for cognitive diagnosis (CLEKI-CD). Specifically, a multidimensional vector is constructed according to the students' mastery and exercise difficulty for each knowledge concept from multiple perspectives, which enhances the representation capabilities of the model. Moreover, a latent Q-matrix is generated by our proposed attention-based knowledge aggregation method, and it can uncover the coverage degree of exercises over latent knowledge. The latent Q-matrix can supplement the sparse explicit Q-matrix with the inherent relationships among knowledge concepts, and mitigate the knowledge coverage problem. Furthermore, we employ a combined cognitive diagnosis layer to integrate both latent and explicit knowledge, further enhancing cognitive diagnosis performance. Extensive experiments on real-world datasets demonstrate that CLEKI-CD outperforms the state-of-the-art models. The proposed CLEKI-CD is promising in practical applications in the field of intelligent education, as it exhibits good interpretability with diagnostic results.


Solgenia -- A Test Vessel Toward Energy-Efficient Autonomous Water Taxi Applications

arXiv.org Artificial Intelligence

Autonomous surface vessels are a promising building block of the future's transport sector and are investigated by research groups worldwide. This paper presents a comprehensive and systematic overview of the autonomous research vessel Solgenia including the latest investigations and recently presented methods that contributed to the fields of autonomous systems, applied numerical optimization, nonlinear model predictive control, multi-extended-object-tracking, computer vision, and collision avoidance. These are considered to be the main components of autonomous water taxi applications. Autonomous water taxis have the potential to transform the traffic in cities close to the water into a more efficient, sustainable, and flexible future state. Regarding this transformation, the test platform Solgenia offers an opportunity to gain new insights by investigating novel methods in real-world experiments. An established test platform will strongly reduce the effort required for real-world experiments in the future.