Goto

Collaborating Authors

 Information Fusion


A Digital Twin Framework for Physical-Virtual Integration in V2X-Enabled Connected Vehicle Corridors

arXiv.org Artificial Intelligence

Transportation Cyber-Physical Systems (T-CPS) are critical in improving traffic safety, reliability, and sustainability by integrating computing, communication, and control in transportation systems. The connected vehicle corridor is at the forefront of this transformation, where Cellular Vehicle-to-Everything (C-V2X) technology facilitates real-time data exchange between infrastructure, vehicles, and road users. However, challenges remain in processing and synchronizing the vast V2X data from vehicles and roadside units, particularly when ensuring scalability, data integrity, and operational resilience. This paper presents a digital twin framework for T-CPS, developed from a real-world connected vehicle corridor to address these challenges. By leveraging C-V2X technology and real-time data from infrastructure, vehicles, and road users, the digital twin accurately replicates vehicle behaviors, signal phases, and traffic patterns within the CARLA simulation environment. This framework demonstrates high fidelity between physical and digital systems and ensures robust synchronization of vehicle trajectories and signal phases through extensive experiments. Moreover, the digital twin's scalable and redundant architecture enhances data integrity, making it capable of supporting future large-scale C-V2X deployments. The digital twin is a vital tool in T-CPS, enabling real-time traffic monitoring, prediction, and optimization to enhance the reliability and safety of transportation systems.


Steering Prediction via a Multi-Sensor System for Autonomous Racing

arXiv.org Artificial Intelligence

Autonomous racing has rapidly gained research attention. Traditionally, racing cars rely on 2D LiDAR as their primary visual system. In this work, we explore the integration of an event camera with the existing system to provide enhanced temporal information. Our goal is to fuse the 2D LiDAR data with event data in an end-to-end learning framework for steering prediction, which is crucial for autonomous racing. To the best of our knowledge, this is the first study addressing this challenging research topic. We start by creating a multisensor dataset specifically for steering prediction. Using this dataset, we establish a benchmark by evaluating various SOTA fusion methods. Our observations reveal that existing methods often incur substantial computational costs. To address this, we apply low-rank techniques to propose a novel, efficient, and effective fusion design. We introduce a new fusion learning policy to guide the fusion process, enhancing robustness against misalignment. Our fusion architecture provides better steering prediction than LiDAR alone, significantly reducing the RMSE from 7.72 to 1.28. Compared to the second-best fusion method, our work represents only 11% of the learnable parameters while achieving better accuracy. The source code, dataset, and benchmark will be released to promote future research.


Multi-modal Medical Image Fusion For Non-Small Cell Lung Cancer Classification

arXiv.org Artificial Intelligence

The early detection and nuanced subtype classification of non-small cell lung cancer (NSCLC), a predominant cause of cancer mortality worldwide, is a critical and complex issue. In this paper, we introduce an innovative integration of multi-modal data, synthesizing fused medical imaging (CT and PET scans) with clinical health records and genomic data. This unique fusion methodology leverages advanced machine learning models, notably MedClip and BEiT, for sophisticated image feature extraction, setting a new standard in computational oncology. Our research surpasses existing approaches, as evidenced by a substantial enhancement in NSCLC detection and classification precision. The results showcase notable improvements across key performance metrics, including accuracy, precision, recall, and F1-score. Specifically, our leading multi-modal classifier model records an impressive accuracy of 94.04%. We believe that our approach has the potential to transform NSCLC diagnostics, facilitating earlier detection and more effective treatment planning and, ultimately, leading to superior patient outcomes in lung cancer care.


HSTFL: A Heterogeneous Federated Learning Framework for Misaligned Spatiotemporal Forecasting

arXiv.org Artificial Intelligence

Spatiotemporal forecasting has emerged as an indispensable building block of diverse smart city applications, such as intelligent transportation and smart energy management. Recent advancements have uncovered that the performance of spatiotemporal forecasting can be significantly improved by integrating knowledge in geo-distributed time series data from different domains, \eg enhancing real-estate appraisal with human mobility data; joint taxi and bike demand predictions. While effective, existing approaches assume a centralized data collection and exploitation environment, overlooking the privacy and commercial interest concerns associated with data owned by different parties. In this paper, we investigate multi-party collaborative spatiotemporal forecasting without direct access to multi-source private data. However, this task is challenging due to 1) cross-domain feature heterogeneity and 2) cross-client geographical heterogeneity, where standard horizontal or vertical federated learning is inapplicable. To this end, we propose a Heterogeneous SpatioTemporal Federated Learning (HSTFL) framework to enable multiple clients to collaboratively harness geo-distributed time series data from different domains while preserving privacy. Specifically, we first devise vertical federated spatiotemporal representation learning to locally preserve spatiotemporal dependencies among individual participants and generate effective representations for heterogeneous data. Then we propose a cross-client virtual node alignment block to incorporate cross-client spatiotemporal dependencies via a multi-level knowledge fusion scheme. Extensive privacy analysis and experimental evaluations demonstrate that HSTFL not only effectively resists inference attacks but also provides a significant improvement against various baselines.


AlterMOMA: Fusion Redundancy Pruning for Camera-LiDAR Fusion Models with Alternative Modality Masking

arXiv.org Artificial Intelligence

Camera-LiDAR fusion models significantly enhance perception performance in autonomous driving. The fusion mechanism leverages the strengths of each modality while minimizing their weaknesses. Moreover, in practice, camera-LiDAR fusion models utilize pre-trained backbones for efficient training. However, we argue that directly loading single-modal pre-trained camera and LiDAR backbones into camera-LiDAR fusion models introduces similar feature redundancy across modalities due to the nature of the fusion mechanism. Unfortunately, existing pruning methods are developed explicitly for single-modal models, and thus, they struggle to effectively identify these specific redundant parameters in camera-LiDAR fusion models. In this paper, to address the issue above on camera-LiDAR fusion models, we propose a novelty pruning framework Alternative Modality Masking Pruning (AlterMOMA), which employs alternative masking on each modality and identifies the redundant parameters. Specifically, when one modality parameters are masked (deactivated), the absence of features from the masked backbone compels the model to reactivate previous redundant features of the other modality backbone. Therefore, these redundant features and relevant redundant parameters can be identified via the reactivation process. The redundant parameters can be pruned by our proposed importance score evaluation function, Alternative Evaluation (AlterEva), which is based on the observation of the loss changes when certain modality parameters are activated and deactivated. Extensive experiments on the nuScene and KITTI datasets encompassing diverse tasks, baseline models, and pruning algorithms showcase that AlterMOMA outperforms existing pruning methods, attaining state-of-the-art performance.


Fast Extrinsic Calibration for Multiple Inertial Measurement Units in Visual-Inertial System

arXiv.org Artificial Intelligence

In this paper, we propose a fast extrinsic calibration method for fusing multiple inertial measurement units (MIMU) to improve visual-inertial odometry (VIO) localization accuracy. Currently, data fusion algorithms for MIMU highly depend on the number of inertial sensors. Based on the assumption that extrinsic parameters between inertial sensors are perfectly calibrated, the fusion algorithm provides better localization accuracy with more IMUs, while neglecting the effect of extrinsic calibration error. Our method builds two non-linear least-squares problems to estimate the MIMU relative position and orientation separately, independent of external sensors and inertial noises online estimation. Then we give the general form of the virtual IMU (VIMU) method and propose its propagation on manifold. We perform our method on datasets, our self-made sensor board, and board with different IMUs, validating the superiority of our method over competing methods concerning speed, accuracy, and robustness. In the simulation experiment, we show that only fusing two IMUs with our calibration method to predict motion can rival nine IMUs. Real-world experiments demonstrate better localization accuracy of the VIO integrated with our calibration method and VIMU propagation on manifold.


UniBEVFusion: Unified Radar-Vision BEVFusion for 3D Object Detection

arXiv.org Artificial Intelligence

4D millimeter-wave (MMW) radar, which provides both height information and dense point cloud data over 3D MMW radar, has become increasingly popular in 3D object detection. In recent years, radar-vision fusion models have demonstrated performance close to that of LiDAR-based models, offering advantages in terms of lower hardware costs and better resilience in extreme conditions. However, many radar-vision fusion models treat radar as a sparse LiDAR, underutilizing radar-specific information. Additionally, these multi-modal networks are often sensitive to the failure of a single modality, particularly vision. To address these challenges, we propose the Radar Depth Lift-Splat-Shoot (RDL) module, which integrates radar-specific data into the depth prediction process, enhancing the quality of visual Bird-Eye View (BEV) features. We further introduce a Unified Feature Fusion (UFF) approach that extracts BEV features across different modalities using shared module. To assess the robustness of multi-modal models, we develop a novel Failure Test (FT) ablation experiment, which simulates vision modality failure by injecting Gaussian noise. We conduct extensive experiments on the View-of-Delft (VoD) and TJ4D datasets. The results demonstrate that our proposed Unified BEVFusion (UniBEVFusion) network significantly outperforms state-of-the-art models on the TJ4D dataset, with improvements of 1.44 in 3D and 1.72 in BEV object detection accuracy.


Cross-Target Stance Detection: A Survey of Techniques, Datasets, and Challenges

arXiv.org Artificial Intelligence

Stance detection is the task of determining the viewpoint expressed in a text towards a given target. A specific direction within the task focuses on cross-target stance detection, where a model trained on samples pertaining to certain targets is then applied to a new, unseen target. With the increasing need to analyze and mining viewpoints and opinions online, the task has recently seen a significant surge in interest. This review paper examines the advancements in cross-target stance detection over the last decade, highlighting the evolution from basic statistical methods to contemporary neural and LLM-based models. These advancements have led to notable improvements in accuracy and adaptability. Innovative approaches include the use of topic-grouped attention and adversarial learning for zero-shot detection, as well as fine-tuning techniques that enhance model robustness. Additionally, prompt-tuning methods and the integration of external knowledge have further refined model performance. A comprehensive overview of the datasets used for evaluating these models is also provided, offering valuable insights into the progress and challenges in the field. We conclude by highlighting emerging directions of research and by suggesting avenues for future work in the task.


Continual Learning for Multimodal Data Fusion of a Soft Gripper

arXiv.org Artificial Intelligence

Continual learning (CL) refers to the ability of an algorithm to continuously and incrementally acquire new knowledge from its environment while retaining previously learned information. A model trained on one data modality often fails when tested with a different modality. A straightforward approach might be to fuse the two modalities by concatenating their features and training the model on the fused data. However, this requires retraining the model from scratch each time it encounters a new domain. In this paper, we introduce a continual learning algorithm capable of incrementally learning different data modalities by leveraging both class-incremental and domain-incremental learning scenarios in an artificial environment where labeled data is scarce, yet non-iid (independent and identical distribution) unlabeled data from the environment is plentiful. The proposed algorithm is efficient and only requires storing prototypes for each class. We evaluate the algorithm's effectiveness on a challenging custom multimodal dataset comprising of tactile data from a soft pneumatic gripper, and visual data from non-stationary images of objects extracted from video sequences. Additionally, we conduct an ablation study on the custom dataset and the Core50 dataset to highlight the contributions of different components of the algorithm. To further demonstrate the robustness of the algorithm, we perform a real-time experiment for object classification using the soft gripper and an external independent camera setup, all synchronized with the Robot Operating System (ROS) framework.


Accurately Tracking Relative Positions of Moving Trackers based on UWB Ranging and Inertial Sensing without Anchors

arXiv.org Artificial Intelligence

We present a tracking system for relative positioning that can operate on entirely moving tracking nodes without the need for stationary anchors. Each node embeds a 9-DOF magnetic and inertial measurement unit and a single-antenna ultra-wideband radio. We introduce a multi-stage filtering pipeline through which our system estimates the relative layout of all tracking nodes within the group. The key novelty of our method is the integration of a custom Extended Kalman filter (EKF) with a refinement step via multidimensional scaling (MDS). Our method integrates the MDS output back into the EKF, thereby creating a dynamic feedback loop for more robust estimates. We complement our method with UWB ranging protocol that we designed to allow tracking nodes to opportunistically join and leave the group. In our evaluation with constantly moving nodes, our system estimated relative positions with an error of 10.2cm (in 2D) and 21.7cm (in 3D), including obstacles that occluded the line of sight between tracking nodes. Our approach requires no external infrastructure, making it particularly suitable for operation in environments where stationary setups are impractical.