Information Fusion
Raci-Net: Ego-vehicle Odometry Estimation in Adverse Weather Conditions
Talebi, Mohammadhossein, Dahal, Pragyan, Possenti, Davide, Arrigoni, Stefano, Braghin, Francesco
Autonomous driving systems are highly dependent on sensors like cameras, LiDAR, and inertial measurement units (IMU) to perceive the environment and estimate their motion. Among these sensors, perception-based sensors are not protected from harsh weather and technical failures. Although existing methods show robustness against common technical issues like rotational misalignment and disconnection, they often degrade when faced with dynamic environmental factors like weather conditions. To address these problems, this research introduces a novel deep learning-based motion estimator that integrates visual, inertial, and millimeter-wave radar data, utilizing each sensor strengths to improve odometry estimation accuracy and reliability under adverse environmental conditions such as snow, rain, and varying light. The proposed model uses advanced sensor fusion techniques that dynamically adjust the contributions of each sensor based on the current environmental condition, with radar compensating for visual sensor limitations in poor visibility. This work explores recent advancements in radar-based odometry and highlights that radar robustness in different weather conditions makes it a valuable component for pose estimation systems, specifically when visual sensors are degraded. Experimental results, conducted on the Boreas dataset, showcase the robustness and effectiveness of the model in both clear and degraded environments.
Unmanned Aerial Vehicle (UAV) Data-Driven Modeling Software with Integrated 9-Axis IMUGPS Sensor Fusion and Data Filtering Algorithm
Arfakhsyad, Azfar Azdi, Rahman, Aufa Nasywa, Kinanti, Larasati, Rizqi, Ahmad Ataka Awwalur, Muhammad, Hannan Nur
-- Unmanned Aerial Vehicle s (UAV) have emerged as versatile platforms, driving the demand for accurate modeling to support developmental testing. This paper proposes data - driven modeling software for UAV. Emphasizes the utilization of cost - effective sensors to obtain orientation and location data subsequently processed through the application of data filtering algorithms and sensor fusion techniques to improve the data quality to make a precise model visualization on the software. UAV's orientation is obtained using processed Inertial Measurement Unit (IMU) data and represented using Quaternion Representation to avoid the gimbal lock problem. The UAV's location is determined by combining data from the Global Positioning System (GPS), which provides stable geographic coordinates but slower data update frequency, and the accelerometer, which has higher data update frequency but integrating it to get position data is unstable due to its accumulative error. By combining data from these two sensors, the software is able to calculate and continuously update the UAV's real - time position during its flight operations. The result shows that the software effectively renders UAV orientation and position with high degree of accuracy and fluidity. Unmanned Aerial Vehicle s (UAV) have rapidly evolved as a versatile platform for various applications [ 1 ] . The increasing demand for UAV development to solve complex environment s necessitates raising the need to develop accurate and reliable simulation models that faithfully represent the dynamic behavior of the UAV. An accurate simulation model of UAV that has been tested allows developers to perform cost - effective analysis and evaluation while also validating the performance of UAV under real - world scenarios.
Towards Robust Sensor-Fusion Ground SLAM: A Comprehensive Benchmark and A Resilient Framework
Zhang, Deteng, Zhang, Junjie, Sun, Yan, Li, Tao, Yin, Hao, Xie, Hongzhao, Yin, Jie
Considerable advancements have been achieved in SLAM methods tailored for structured environments, yet their robustness under challenging corner cases remains a critical limitation. Although multi-sensor fusion approaches integrating diverse sensors have shown promising performance improvements, the research community faces two key barriers: On one hand, the lack of standardized and configurable benchmarks that systematically evaluate SLAM algorithms under diverse degradation scenarios hinders comprehensive performance assessment. While on the other hand, existing SLAM frameworks primarily focus on fusing a limited set of sensor types, without effectively addressing adaptive sensor selection strategies for varying environmental conditions. To bridge these gaps, we make three key contributions: First, we introduce M3DGR dataset: a sensor-rich benchmark with systematically induced degradation patterns including visual challenge, LiDAR degeneracy, wheel slippage and GNSS denial. Second, we conduct a comprehensive evaluation of forty SLAM systems on M3DGR, providing critical insights into their robustness and limitations under challenging real-world conditions. Third, we develop a resilient modular multi-sensor fusion framework named Ground-Fusion++, which demonstrates robust performance by coupling GNSS, RGB-D, LiDAR, IMU (Inertial Measurement Unit) and wheel odometry. Codes and datasets are publicly available.
Multi-view mid fusion: a universal approach for learning in an HDLSS setting
The high-dimensional low-sample-size (HDLSS) setting presents significant challenges in various applications where the feature dimension far exceeds the number of available samples. This paper introduces a universal approach for learning in HDLSS settings using multi-view mid fusion techniques. It shows how existing mid fusion multi-view methods perform well in an HDLSS setting even if no inherent views are provided. Three view construction methods are proposed that split the high-dimensional feature vectors into smaller subsets, each representing a different view. Extensive experimental validation across model-types and learning tasks confirm the effectiveness and generalization of the approach. We believe the work in this paper lays the foundation for further research into the universal benefits of multi-view mid fusion learning.
Feature Geometry for Stereo Sidescan and Forward-looking Sonar
Norman, Kalin, Mangelson, Joshua G.
-- In this paper, we address stereo acoustic data fusion for marine robotics and propose a geometry-based method for projecting observed features from one sonar to another for a cross-modal stereo sonar setup that consists of both a forward-looking and a sidescan sonar . Our acoustic geometry for sidescan and forward-looking sonar is inspired by the epipolar geometry for stereo cameras, and we leverage relative pose information to project where an observed feature in one sonar image will be found in the image of another sonar . Additionally, we analyze how both the feature location relative to the sonar and the relative pose between the two sonars impact the projection. From simulated results, we identify desirable stereo configurations for applications in field robotics like feature correspondence and recovery of the 3D information of the feature. Field robotic applications, such as localization and mapping, in underwater environments face significant challenges due to the complex and dynamic nature of the marine domain.
Hierarchical Semantic-Visual Fusion of Visible and Near-infrared Images for Long-range Haze Removal
Li, Yi, Wang, Xiaoxiong, Wang, Jiawei, Chang, Yi, Cao, Kai, Yan, Luxin
--While image dehazing has advanced substantially in the past decade, most efforts have focused on short-range scenarios, leaving long-range haze removal under-explored. As distance increases, intensified scattering leads to severe haze and signal loss, making it impractical to recover distant details solely from visible images. Near-infrared, with superior fog penetration, offers critical complementary cues through multimodal fusion. However, existing methods focus on content integration while often neglecting haze embedded in visible images, leading to results with residual haze. In this work, we argue that the infrared and visible modalities not only provide complementary low-level visual features, but also share high-level semantic consistency. Motivated by this, we propose a Hierarchical Semantic-Visual Fusion (HSVF) framework, comprising a semantic stream to reconstruct haze-free scenes and a visual stream to incorporate structural details from the near-infrared modality. The semantic stream first acquires haze-robust semantic prediction by aligning modality-invariant intrinsic representations. Then the shared semantics act as strong priors to restore clear and high-contrast distant scenes under severe haze degradation. In parallel, the visual stream focuses on recovering lost structural details from near-infrared by fusing complementary cues from both visible and near-infrared images. Through the cooperation of dual streams, HSVF produces results that exhibit both high-contrast scenes and rich texture details. Moreover, we introduce a novel pixel-aligned visible-infrared haze dataset with semantic labels to facilitate benchmarking. Extensive experiments demonstrate the superiority of our method over state-of-the-art approaches in real-world long-range haze removal. AZE, especially long-range haze, has been shown to not only degrade visual quality substantially, but also impair numerous high-level vision tasks [1]-[10].
Towards Explainable Fusion and Balanced Learning in Multimodal Sentiment Analysis
Luo, Miaosen, Jiang, Yuncheng, Mai, Sijie
Multimodal Sentiment Analysis (MSA) faces two critical challenges: the lack of interpretability in the decision logic of multimodal fusion and modality imbalance caused by disparities in inter-modal information density. To address these issues, we propose KAN-MCP, a novel framework that integrates the interpretability of Kolmogorov-Arnold Networks (KAN) with the robustness of the Multimodal Clean Pareto (MCPareto) framework. First, KAN leverages its univariate function decomposition to achieve transparent analysis of cross-modal interactions. This structural design allows direct inspection of feature transformations without relying on external interpretation tools, thereby ensuring both high expressiveness and interpretability. Second, the proposed MCPareto enhances robustness by addressing modality imbalance and noise interference. Specifically, we introduce the Dimensionality Reduction and Denoising Modal Information Bottleneck (DRD-MIB) method, which jointly denoises and reduces feature dimensionality. This approach provides KAN with discriminative low-dimensional inputs to reduce the modeling complexity of KAN while preserving critical sentiment-related information. Furthermore, MCPareto dynamically balances gradient contributions across modalities using the purified features output by DRD-MIB, ensuring lossless transmission of auxiliary signals and effectively alleviating modality imbalance. This synergy of interpretability and robustness not only achieves superior performance on benchmark datasets such as CMU-MOSI, CMU-MOSEI, and CH-SIMS v2 but also offers an intuitive visualization interface through KAN's interpretable architecture. Our code is released on https://github.com/LuoMSen/KAN-MCP.
CoInfra: A Large-Scale Cooperative Infrastructure Perception System and Dataset in Adverse Weather
Ning, Minghao, Yang, Yufeng, Shu, Keqi, Huang, Shucheng, Zhong, Jiaming, Salehi, Maryam, Rahmani, Mahdi, Lu, Yukun, Sun, Chen, Saleh, Aladdin, Hashemi, Ehsan, Khajepour, Amir
We present CoInfra, a large-scale cooperative infrastructure perception system and dataset designed to advance robust multi-agent perception under real-world and adverse weather conditions. The CoInfra system includes 14 fully synchronized sensor nodes, each equipped with dual RGB cameras and a LiDAR, deployed across a shared region and operating continuously to capture all traffic participants in real-time. A robust, delay-aware synchronization protocol and a scalable system architecture that supports real-time data fusion, OTA management, and remote monitoring are provided in this paper. On the other hand, the dataset was collected in different weather scenarios, including sunny, rainy, freezing rain, and heavy snow and includes 195k LiDAR frames and 390k camera images from 8 infrastructure nodes that are globally time-aligned and spatially calibrated. Furthermore, comprehensive 3D bounding box annotations for five object classes (i.e., car, bus, truck, person, and bicycle) are provided in both global and individual node frames, along with high-definition maps for contextual understanding. Baseline experiments demonstrate the trade-offs between early and late fusion strategies, the significant benefits of HD map integration are discussed. By openly releasing our dataset, codebase, and system documentation at https://github.com/NingMingHao/CoInfra, we aim to enable reproducible research and drive progress in infrastructure-supported autonomous driving, particularly in challenging, real-world settings.
Consistency-Aware Padding for Incomplete Multi-Modal Alignment Clustering Based on Self-Repellent Greedy Anchor Search
Ma, Shubin, Zhao, Liang, Lu, Mingdong, Guo, Yifan, Xu, Bo
Multimodal representation is faithful and highly effective in describing real-world data samples' characteristics by describing their complementary information. However, the collected data often exhibits incomplete and misaligned characteristics due to factors such as inconsistent sensor frequencies and device malfunctions. Existing research has not effectively addressed the issue of filling missing data in scenarios where multiview data are both imbalanced and misaligned. Instead, it relies on class-level alignment of the available data. Thus, it results in some data samples not being well-matched, thereby affecting the quality of data fusion. In this paper, we propose the Consistency-A ware Padding for Incomplete Multimodal Alignment Clustering Based on Self-Repellent Greedy Anchor Search(CAPIMAC) to tackle the problem of filling imbalanced and mis-aligned data in multimodal datasets. Specifically, we propose a self-repellent greedy anchor search module(SRGASM), which employs a self-repellent random walk combined with a greedy algorithm to identify anchor points for re-representing incomplete and misaligned multimodal data. Subsequently, based on noise-contrastive learning, we design a consistency-aware padding module (CAPM) to effectively interpolate and align imbalanced and misaligned data, thereby improving the quality of multimodal data fusion. Experimental results demonstrate the superiority of our method over benchmark datasets.
Evaluation of an Uncertainty-Aware Late Fusion Algorithm for Multi-Source Bird's Eye View Detections Under Controlled Noise
Fadili, Maryem, Lecrosnier, Louis, Pechberti, Steve, Khemmar, Redouane
--Reliable multi-source fusion is crucial for robust perception in autonomous systems. However, evaluating fusion performance independently of detection errors remains challenging. This work introduces a systematic evaluation framework that injects controlled noise into ground-truth bounding boxes to isolate the fusion process. We then propose Unified Kalman Fusion (UniKF), a late-fusion algorithm based on Kalman filtering to merge Bird's Eye View (BEV) detections while handling synchronization issues. Experiments show that UniKF outperforms baseline methods across various noise levels, achieving up to 3 lower object's positioning and orientation errors and 2 lower dimension estimation errors, while maintaining near-perfect precision and recall between 99. 5% and 100%. Accurate perception is fundamental for autonomous driving, especially in complex urban settings where sensor occlusions, limited range, and adverse weather degrade detection quality [1]. Collaborative perception, enabled by onboard sensors' communication and V ehicle-to-Everything (V2X) communication, enhances perception by sharing sensor data across multiple sensors or agents [2], [3]. Early fusion methods require high bandwidth and strict time synchronization. Deep fusion demands access to proprietary models, which is impractical due to privacy and intellectual property restrictions. Late fusion, which operates at the object detection level, offers a scalable, bandwidth-efficient, and detector-model-agnostic alternative.