Goto

Collaborating Authors

 Information Fusion


V2X-ReaLO: An Open Online Framework and Dataset for Cooperative Perception in Reality

arXiv.org Artificial Intelligence

Cooperative perception enabled by Vehicle-to-Everything (V2X) communication holds significant promise for enhancing the perception capabilities of autonomous vehicles, allowing them to overcome occlusions and extend their field of view. However, existing research predominantly relies on simulated environments or static datasets, leaving the feasibility and effectiveness of V2X cooperative perception especially for intermediate fusion in real-world scenarios largely unexplored. In this work, we introduce V2X-ReaLO, an open online cooperative perception framework deployed on real vehicles and smart infrastructure that integrates early, late, and intermediate fusion methods within a unified pipeline and provides the first practical demonstration of online intermediate fusion's feasibility and performance under genuine real-world conditions. Additionally, we present an open benchmark dataset specifically designed to assess the performance of online cooperative perception systems. This new dataset extends V2X-Real dataset to dynamic, synchronized ROS bags and provides 25,028 test frames with 6,850 annotated key frames in challenging urban scenarios. By enabling real-time assessments of perception accuracy and communication lantency under dynamic conditions, V2X-ReaLO sets a new benchmark for advancing and optimizing cooperative perception systems in real-world applications. The codes and datasets will be released to further advance the field.


The Algorithmic State Architecture (ASA): An Integrated Framework for AI-Enabled Government

arXiv.org Artificial Intelligence

As artificial intelligence transforms public sector operations, governments struggle to integrate technological innovations into coherent systems for effective service delivery. This paper introduces the Algorithmic State Architecture (ASA), a novel four-layer framework conceptualising how Digital Public Infrastructure, Data-for-Policy, Algorithmic Government/Governance, and GovTech interact as an integrated system in AI-enabled states. Unlike approaches that treat these as parallel developments, ASA positions them as interdependent layers with specific enabling relationships and feedback mechanisms. Through comparative analysis of implementations in Estonia, Singapore, India, and the UK, we demonstrate how foundational digital infrastructure enables systematic data collection, which powers algorithmic decision-making processes, ultimately manifesting in user-facing services. Our analysis reveals that successful implementations require balanced development across all layers, with particular attention to integration mechanisms between them. The framework contributes to both theory and practice by bridging previously disconnected domains of digital government research, identifying critical dependencies that influence implementation success, and providing a structured approach for analysing the maturity and development pathways of AI-enabled government systems.


FedMSGL: A Self-Expressive Hypergraph Based Federated Multi-View Learning

arXiv.org Artificial Intelligence

Federated learning is essential for enabling collaborative model training across decentralized data sources while preserving data privacy and security. This approach mitigates the risks associated with centralized data collection and addresses concerns related to data ownership and compliance. Despite significant advancements in federated learning algorithms that address communication bottlenecks and enhance privacy protection, existing works overlook the impact of differences in data feature dimensions, resulting in global models that disproportionately depend on participants with large feature dimensions. Additionally, current single-view federated learning methods fail to account for the unique characteristics of multi-view data, leading to suboptimal performance in processing such data. To address these issues, we propose a Self-expressive Hypergraph Based Federated Multi-view Learning method (FedMSGL). The proposed method leverages self-expressive character in the local training to learn uniform dimension subspace with latent sample relation. At the central side, an adaptive fusion technique is employed to generate the global model, while constructing a hypergraph from the learned global and view-specific subspace to capture intricate interconnections across views. Experiments on multi-view datasets with different feature dimensions validated the effectiveness of the proposed method.


SIMAC: A Semantic-Driven Integrated Multimodal Sensing And Communication Framework

arXiv.org Artificial Intelligence

Traditional single-modality sensing faces limitations in accuracy and capability, and its decoupled implementation with communication systems increases latency in bandwidth-constrained environments. Additionally, single-task-oriented sensing systems fail to address users' diverse demands. To overcome these challenges, we propose a semantic-driven integrated multimodal sensing and communication (SIMAC) framework. This framework leverages a joint source-channel coding architecture to achieve simultaneous sensing decoding and transmission of sensing results. Specifically, SIMAC first introduces a multimodal semantic fusion (MSF) network, which employs two extractors to extract semantic information from radar signals and images, respectively. MSF then applies cross-attention mechanisms to fuse these unimodal features and generate multimodal semantic representations. Secondly, we present a large language model (LLM)-based semantic encoder (LSE), where relevant communication parameters and multimodal semantics are mapped into a unified latent space and input to the LLM, enabling channel-adaptive semantic encoding. Thirdly, a task-oriented sensing semantic decoder (SSD) is proposed, in which different decoded heads are designed according to the specific needs of tasks. Simultaneously, a multi-task learning strategy is introduced to train the SIMAC framework, achieving diverse sensing services. Finally, experimental simulations demonstrate that the proposed framework achieves diverse sensing services and higher accuracy.


Enhancing Sentiment Analysis through Multimodal Fusion: A BERT-DINOv2 Approach

arXiv.org Artificial Intelligence

Multimodal sentiment analysis enhances conventional sentiment analysis, which traditionally relies solely on text, by incorporating information from different modalities such as images, text, and audio. This paper proposes a novel multimodal sentiment analysis architecture that integrates text and image data to provide a more comprehensive understanding of sentiments. For text feature extraction, we utilize BERT, a natural language processing model. For image feature extraction, we employ DINOv2, a vision-transformer-based model. The textual and visual latent features are integrated using proposed fusion techniques, namely the Basic Fusion Model, Self-Attention Fusion Model, and Dual-Attention Fusion Model. Experiments on three datasets--the Memotion 7k dataset, MVSA-single dataset, and MVSA-multi dataset--demonstrate the viability and practicality of the proposed multimodal architecture.


Probabilistic Segmentation for Robust Field of View Estimation

arXiv.org Artificial Intelligence

Attacks on sensing and perception threaten the safe deployment of autonomous vehicles (AVs). Security-aware sensor fusion helps mitigate threats but requires accurate field of view (FOV) estimation which has not been evaluated autonomy. To address this gap, we adapt classical computer graphics algorithms to develop the first autonomy-relevant FOV estimators and create the first datasets with ground truth FOV labels. Unfortunately, we find that these approaches are themselves highly vulnerable to attacks on sensing. To improve robustness of FOV estimation against attacks, we propose a learning-based segmentation model that captures FOV features, integrates Monte Carlo dropout (MCD) for uncertainty quantification, and performs anomaly detection on confidence maps. We illustrate through comprehensive evaluations attack resistance and strong generalization across environments. Architecture trade studies demonstrate the model is feasible for real-time deployment in multiple applications.


Availability-aware Sensor Fusion via Unified Canonical Space for 4D Radar, LiDAR, and Camera

arXiv.org Artificial Intelligence

Sensor fusion of camera, LiDAR, and 4-dimensional (4D) Radar has brought a significant performance improvement in autonomous driving (AD). However, there still exist fundamental challenges: deeply coupled fusion methods assume continuous sensor availability, making them vulnerable to sensor degradation and failure, whereas sensor-wise cross-attention fusion methods struggle with computational cost and unified feature representation. This paper presents availability-aware sensor fusion (ASF), a novel method that employs unified canonical projection (UCP) to enable consistency in all sensor features for fusion and cross-attention across sensors along patches (CASAP) to enhance robustness of sensor fusion against sensor degradation and failure. As a result, the proposed ASF shows a superior object detection performance to the existing state-of-the-art fusion methods under various weather and sensor degradation (or failure) conditions; Extensive experiments on the K-Radar dataset demonstrate that ASF achieves improvements of 9.7% in AP BEV (87.2%) and 20.1% in AP 3D (73.6%) in object detection at IoU=0.5, while requiring a low computational cost. The code will be available at https://github.com/kaist-avelab/K-Radar.


AKF-LIO: LiDAR-Inertial Odometry with Gaussian Map by Adaptive Kalman Filter

arXiv.org Artificial Intelligence

Existing LiDAR-Inertial Odometry (LIO) systems typically use sensor-specific or environment-dependent measurement covariances during state estimation, leading to laborious parameter tuning and suboptimal performance in challenging conditions (e.g., sensor degeneracy and noisy observations). Therefore, we propose an Adaptive Kalman Filter (AKF) framework that dynamically estimates time-varying noise covariances of LiDAR and Inertial Measurement Unit (IMU) measurements, enabling context-aware confidence weighting between sensors. During LiDAR degeneracy, the system prioritizes IMU data while suppressing contributions from unreliable inputs like moving objects or noisy point clouds. Furthermore, a compact Gaussian-based map representation is introduced to model environmental planarity and spatial noise. A correlated registration strategy ensures accurate plane normal estimation via pseudo-merge, even in unstructured environments like forests. Extensive experiments validate the robustness of the proposed system across diverse environments, including dynamic scenes and geometrically degraded scenarios. Our method achieves reliable localization results across all MARS-LVIG sequences and ranks 8th on the KITTI Odometry Benchmark. The code will be released at https://github.com/xpxie/AKF-LIO.git.


Enhancing Thin-Film Wafer Inspection With A Multi-Sensor Array And Robot Constraint Maintenance

arXiv.org Artificial Intelligence

Thin-film inspection on large-area substrates in coating manufacture remains a critical parameter to ensure product quality; however, extending the inspection process precisely over a large area presents major challenges, due to the limitations of the available inspection equipment. An additional manipulation problem arises when automating the inspection process, as the silicon wafer requires movement constraints to ensure accurate measurements and to prevent damage. Furthermore, there are other increasingly important large-area industrial applications, such as Roll-to-Roll (R2R) manufacturing where coating thickness inspection introduces additional challenges. This paper presents an autonomous inspection system using a robotic manipulator with a novel learned constraint manifold to control a wafer to its calibration point, and a novel multi-sensor array with high potential for scalability into large substrate areas. We demonstrate that the manipulator can perform required motions whilst adhering to movement constraints. We further demonstrate that the sensor array can perform thickness measurements statically with an error of $<2\%$ compared to a commercial reflectometer, and through the use of a manipulator can dynamically detect angle variations $>0.5^\circ$ from the calibration point whilst monitoring the RMSE and $R^2$ over 1406 data points. These features are potentially useful for detecting displacement variations in R2R manufacturing processes.


Meta Learning not to Learn: Robustly Informing Meta-Learning under Nuisance-Varying Families

arXiv.org Artificial Intelligence

In settings where both spurious and causal predictors are available, standard neural networks trained under the objective of empirical risk minimization (ERM) with no additional inductive biases tend to have a dependence on a spurious feature. As a result, it is necessary to integrate additional inductive biases in order to guide the network toward generalizable hypotheses. Often these spurious features are shared across related tasks, such as estimating disease prognoses from image scans coming from different hospitals, making the challenge of generalization more difficult. In these settings, it is important that methods are able to integrate the proper inductive biases to generalize across both nuisance-varying families as well as task families. Motivated by this setting, we present RIME (Robustly Informed Meta lEarning), a new method for meta learning under the presence of both positive and negative inductive biases (what to learn and what not to learn). We first develop a theoretical causal framework showing why existing approaches at knowledge integration can lead to worse performance on distributionally robust objectives. We then show that RIME is able to simultaneously integrate both biases, reaching state of the art performance under distributionally robust objectives in informed meta-learning settings under nuisance-varying families.