Goto

Collaborating Authors

 Data Integration


Generalized Robust Adaptive-Bandwidth Multi-View Manifold Learning in High Dimensions with Noise

Ding, Xiucai, Shen, Chao, Wu, Hau-Tieng

arXiv.org Machine Learning

Multiview datasets are common in scientific and engineering applications, yet existing fusion methods offer limited theoretical guarantees, particularly in the presence of heterogeneous and high-dimensional noise. We propose Generalized Robust Adaptive-Bandwidth Multiview Diffusion Maps (GRAB-MDM), a new kernel-based diffusion geometry framework for integrating multiple noisy data sources. The key innovation of GRAB-MDM is a {view}-dependent bandwidth selection strategy that adapts to the geometry and noise level of each view, enabling a stable and principled construction of multiview diffusion operators. Under a common-manifold model, we establish asymptotic convergence results and show that the adaptive bandwidths lead to provably robust recovery of the shared intrinsic structure, even when noise levels and sensor dimensions differ across views. Numerical experiments demonstrate that GRAB-MDM significantly improves robustness and embedding quality compared with fixed-bandwidth and equal-bandwidth baselines, and usually outperform existing algorithms. The proposed framework offers a practical and theoretically grounded solution for multiview sensor fusion in high-dimensional noisy environments.


Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model

Neural Information Processing Systems

Existing multi-modal image fusion methods fail to address the compound degradations presented in source images, resulting in fusion images plagued by noise, color bias, improper exposure, etc. Additionally, these methods often overlook the specificity of foreground objects, weakening the salience of the objects of interest within the fused images. To address these challenges, this study proposes a novel interactive multi-modal image fusion framework based on the text-modulated diffusion model, called Text-DiFuse.


TEMPO-VINE: A Multi-Temporal Sensor Fusion Dataset for Localization and Mapping in Vineyards

Martini, Mauro, Ambrosio, Marco, Vilella-Cantos, Judith, Navone, Alessandro, Chiaberge, Marcello

arXiv.org Artificial Intelligence

In recent years, precision agriculture has been introducing groundbreaking innovations in the field, with a strong focus on automation. However, research studies in robotics and autonomous navigation often rely on controlled simulations or isolated field trials. The absence of a realistic common benchmark represents a significant limitation for the diffusion of robust autonomous systems under real complex agricultural conditions. Vineyards pose significant challenges due to their dynamic nature, and they are increasingly drawing attention from both academic and industrial stakeholders interested in automation. In this context, we introduce the TEMPO-VINE dataset, a large-scale multi-temporal dataset specifically designed for evaluating sensor fusion, simultaneous localization and mapping (SLAM), and place recognition techniques within operational vineyard environments. TEMPO-VINE is the first multi-modal public dataset that brings together data from heterogeneous LiDARs of different price levels, AHRS, RTK-GPS, and cameras in real trellis and pergola vineyards, with multiple rows exceeding 100 m in length. In this work, we address a critical gap in the landscape of agricultural datasets by providing researchers with a comprehensive data collection and ground truth trajectories in different seasons, vegetation growth stages, terrain and weather conditions. The sequence paths with multiple runs and revisits will foster the development of sensor fusion, localization, mapping and place recognition solutions for agricultural fields. The dataset, the processing tools and the benchmarking results will be available at the dedicated webpage upon acceptance.


A Comprehensive Survey on Surgical Digital Twin

Khan, Afsah Sharaf, Fan, Falong, Kim, Doohwan DH, Alshareef, Abdurrahman, Chen, Dong, Kim, Justin, Carter, Ernest, Liu, Bo, Rozenblit, Jerzy W., Zeigler, Bernard

arXiv.org Artificial Intelligence

Such models are integral to the development of context-aware surgical training systems and process monitoring platforms [11], [19] as well as for encoding adaptive robotic control policies in teleoperated environments [13], [20], [78]. However, their limited capacity to capture continuous biophysical dynamics can constrain their utility in applications where physiological fidelity is essential. Recognizing the limitations inherent in purely continuous or discrete approaches, hybrid modeling strategies have emerged as a state-of-the-art solution for surgical digital twins. These frameworks integrate continuous dynamic models with discrete state machines, enabling the simultaneous tracking of physiological changes and procedural events [8], [7], [19], [37]. For example, hybrid automata have been deployed to synchronize real-time updates of tissue deformation with the sequencing of surgical tool actions [7], [19]. This integration allows digital twins to provide context-sensitive support, adapting to abrupt workflow transitions and physiological perturbations alike--a critical requirement in both routine and emergent surgical scenarios [8], [11], [7]. B. Mutual Information and Information-Theoretic Approaches With the proliferation of multi-modal surgical data, information-theoretic concepts have become indispensable for quantifying uncertainty, relevance, and redundancy across heterogeneous information streams. Mutual information I(X; Y) has been adopted as a rigorous metric for selecting the most informative sensors, imaging modalities, or clinical parameters, thereby enhancing the efficiency and robustness of digital twin-enabled decision support [2], [3], [13], [34], [11], [51], [48], [26], [29]. This is formally captured as Eq.


A review on data fusion in multimodal learning analytics and educational data mining

Chango, Wilson, Lara, Juan A., Cerezo, Rebeca, Romero, Cristóbal

arXiv.org Artificial Intelligence

Th e new educational models such as Smart Learning environments use of digita l and context - aware devices to facilitate the learning process . In this new educational scenario, a huge quantity of multimodal students' data from a variety of different sources can be captured, fused and analyze. It offers to researchers and educators a unique opportunity of being able to discover new knowledge to better understand the learning process and to intervene if necessary. However, it is necessary t o apply correctly d ata f usion approaches and techniques in order to combine various sources of Multimodal Learning Data (MLA) . The se sources or modalities in MLA include audio, video, electrodermal activity data, eye - tracking, user logs and click - stream data, but also learning artifacts and more natural human signals such as gestures, gaze, speech or writing. This survey introduces data fusion in Learning Analytics (LA) and Educational Data Mining (EDM) and how these data fusion techniques have been applied in Smart Learning. It shows the current state of the art by reviewing the main publications, the main type of fused educational data, and the data fusion approaches and techniques used in EDM/LA, as well as the main open problems, trends and challenges in th is specific research area.


Deep Learning Approach to Anomaly Detection in Enterprise ETL Processes with Autoencoders

Chen, Xin, Gadgil, Saili Uday, Gao, Kangning, Hu, Yi, Nie, Cong

arXiv.org Artificial Intelligence

An anomaly detection method based on deep autoencoders is proposed to address anomalies that often occur in enterprise-level ETL data streams. The study first analyzes multiple types of anomalies in ETL processes, including delays, missing values, duplicate loading, and sudden abnormal changes, and applies data standardization and feature modeling to ensure stable and usable inputs. In the method design, the encoder-decoder structure compresses high-dimensional inputs into latent representations and reconstructs them, while reconstruction error is used to measure anomaly levels. Regularization constraints are introduced in the latent space to enhance feature sparsity and distribution learning, thereby improving robustness in complex data streams. Systematic analyses under different hyperparameter settings, environmental changes, and data characteristics show that the proposed method achieves superior performance in AUC, ACC, Precision, and Recall. The results demonstrate that the deep autoencoder-based detection mechanism can effectively capture latent distribution patterns in enterprise-level ETL data streams and accurately identify diverse anomalies, providing reliable support for enterprise data processing and intelligent analysis.


FuseUNet: A Multi-Scale Feature Fusion Method for U-like Networks

He, Quansong, Min, Xiangde, Wang, Kaishen, He, Tao

arXiv.org Artificial Intelligence

Medical image segmentation is a critical task in computer vision, with UNet serving as a milestone architecture. The typical component of UNet family is the skip connection, however, their skip connections face two significant limitations: (1) they lack effective interaction between features at different scales, and (2) they rely on simple concatenation or addition operations, which constrain efficient information integration. While recent improvements to UNet have focused on enhancing encoder and decoder capabilities, these limitations remain overlooked. To overcome these challenges, we propose a novel multi-scale feature fusion method that reimagines the UNet decoding process as solving an initial value problem (IVP), treating skip connections as discrete nodes. By leveraging principles from the linear multistep method, we propose an adaptive ordinary differential equation method to enable effective multi-scale feature fusion. Our approach is independent of the encoder and decoder architectures, making it adaptable to various U-Net-like networks. Experiments on ACDC, KiTS2023, MSD brain tumor, and ISIC2017/2018 skin lesion segmentation datasets demonstrate improved feature utilization, reduced network parameters, and maintained high performance. The code is available at https://github.com/nayutayuki/FuseUNet.


V eXKD: The Versatile Integration of Cross-Modal Fusion and Knowledge Distillation for 3D Perception Y uzhe JI

Neural Information Processing Systems

Recent advancements in 3D perception have led to a proliferation of network architectures, particularly those involving multi-modal fusion algorithms. While these fusion algorithms improve accuracy, their complexity often impedes real-time performance. This paper introduces V eXKD, an effective and V ersatile framework that integrates Cross-Modal Fusion with K nowledge D istillation. V eXKD applies knowledge distillation exclusively to the Bird's Eye View (BEV) feature maps, enabling the transfer of cross-modal insights to single-modal students without additional inference time overhead. It avoids volatile components that can vary across various 3D perception tasks and student modalities, thus improving versatility. The framework adopts a modality-general cross-modal fusion module to bridge the modality gap between the multi-modal teachers and single-modal students. Furthermore, leveraging byproducts generated during fusion, our BEV query guided mask generation network identifies crucial spatial locations across different BEV feature maps from different tasks and semantic levels in a data-driven manner, significantly enhancing the effectiveness of knowledge distillation. Extensive experiments on the nuScenes dataset demonstrate notable improvements, with up to 6.9%/4.2%


Requirements for Game-Based Learning Design Framework for Information System Integration in the Context of Post-Merger Integration

Lace, Ksenija, Kirikova, Marite

arXiv.org Artificial Intelligence

Post - merger integration states unique challenges for professionals responsible for information system integration aimed on alignment and combination diverse system architectures of merging organizations . Although the theoretical and practical guidance exists for post - merger integration on the business level, there is a significant gap in training for information system integration in this context. In prior research specific methods AMILI ( Support method for informed decision identification) and AMILP ( Support method for informed decision - making) were introduced for the support of information system integration decisions in the post - merger integration. But during the practical application was reported high learning curve and low learner motivation. This paper explores how game - based learning design can address these limitations by transforming static method training into engaging learning experience. The study analyzes foundational learning theories, cognitive load and motivation models, and serious game design frameworks to identify the essential requirements for a game - based learning design framework tailored to information system integration in post - merger integration. Requirements are struct ured in two components: the transformation process and resulting learning experience. The paper concludes with a plan for developing and evaluating the proposed framework through iterative design and real - world validation. Keywords: Post - merger integration, Information systems, Game - based learning, Instructional design, Serious games .


LLM-Based Data Science Agents: A Survey of Capabilities, Challenges, and Future Directions

Rahman, Mizanur, Bhuiyan, Amran, Islam, Mohammed Saidul, Laskar, Md Tahmid Rahman, Mahbub, Ridwan, Masry, Ahmed, Joty, Shafiq, Hoque, Enamul

arXiv.org Artificial Intelligence

Recent advances in large language models (LLMs) have enabled a new class of AI agents that automate multiple stages of the data science workflow by integrating planning, tool use, and multimodal reasoning across text, code, tables, and visuals. This survey presents the first comprehensive, lifecycle-aligned taxonomy of data science agents, systematically analyzing and mapping forty-five systems onto the six stages of the end-to-end data science process: business understanding and data acquisition, exploratory analysis and visualization, feature engineering, model building and selection, interpretation and explanation, and deployment and monitoring. In addition to lifecycle coverage, we annotate each agent along five cross-cutting design dimensions: reasoning and planning style, modality integration, tool orchestration depth, learning and alignment methods, and trust, safety, and governance mechanisms. Beyond classification, we provide a critical synthesis of agent capabilities, highlight strengths and limitations at each stage, and review emerging benchmarks and evaluation practices. Our analysis identifies three key trends: most systems emphasize exploratory analysis, visualization, and modeling while neglecting business understanding, deployment, and monitoring; multimodal reasoning and tool orchestration remain unresolved challenges; and over 90% lack explicit trust and safety mechanisms. We conclude by outlining open challenges in alignment stability, explainability, governance, and robust evaluation frameworks, and propose future research directions to guide the development of robust, trustworthy, low-latency, transparent, and broadly accessible data science agents.