Goto

Collaborating Authors

 Information Fusion


Landmark Alternating Diffusion

arXiv.org Machine Learning

Alternating Diffusion (AD) is a commonly applied diffusion-based sensor fusion algorithm. While it has been successfully applied to various problems, its computational burden remains a limitation. Inspired by the landmark diffusion idea considered in the Robust and Scalable Embedding via Landmark Diffusion (ROSELAND), we propose a variation of AD, called Landmark AD (LAD), which captures the essence of AD while offering superior computational efficiency. We provide a series of theoretical analyses of LAD under the manifold setup and apply it to the automatic sleep stage annotation problem with two electroencephalogram channels to demonstrate its application.


A Survey on Intermediate Fusion Methods for Collaborative Perception Categorized by Real World Challenges

arXiv.org Artificial Intelligence

This survey analyzes intermediate fusion methods in collaborative perception for autonomous driving, categorized by real-world challenges. We examine various methods, detailing their features and the evaluation metrics they employ. The focus is on addressing challenges like transmission efficiency, localization errors, communication disruptions, and heterogeneity. Moreover, we explore strategies to counter adversarial attacks and defenses, as well as approaches to adapt to domain shifts. The objective is to present an overview of how intermediate fusion methods effectively meet these diverse challenges, highlighting their role in advancing the field of collaborative perception in autonomous driving.


Enhancing Track Management Systems with Vehicle-To-Vehicle Enabled Sensor Fusion

arXiv.org Artificial Intelligence

In the rapidly advancing landscape of connected and automated vehicles (CAV), the integration of Vehicle-to-Everything (V2X) communication in traditional fusion systems presents a promising avenue for enhancing vehicle perception. Addressing current limitations with vehicle sensing, this paper proposes a novel Vehicle-to-Vehicle (V2V) enabled track management system that leverages the synergy between V2V signals and detections from radar and camera sensors. The core innovation lies in the creation of independent priority track lists, consisting of fused detections validated through V2V communication. This approach enables more flexible and resilient thresholds for track management, particularly in scenarios with numerous occlusions where the tracked objects move outside the field of view of the perception sensors. The proposed system considers the implications of falsification of V2X signals which is combated through an initial vehicle identification process using detection from perception sensors. Presented are the fusion algorithm, simulated environments, and validation mechanisms. Experimental results demonstrate the improved accuracy and robustness of the proposed system in common driving scenarios, highlighting its potential to advance the reliability and efficiency of autonomous vehicles.


Unimodal and Multimodal Sensor Fusion for Wearable Activity Recognition

arXiv.org Artificial Intelligence

Combining different sensing modalities with multiple positions helps form a unified perception and understanding of complex situations such as human behavior. Hence, human activity recognition (HAR) benefits from combining redundant and complementary information (Unimodal/Multimodal). Even so, it is not an easy task. It requires a multidisciplinary approach, including expertise in sensor technologies, signal processing, data fusion algorithms, and domain-specific knowledge. This Ph.D. work employs sensing modalities such as inertial, pressure (audio and atmospheric pressure), and textile capacitive sensing for HAR. The scenarios explored are gesture and hand position tracking, facial and head pattern recognition, and body posture and gesture recognition. The selected wearable devices and sensing modalities are fully integrated with machine learning-based algorithms, some of which are implemented in the embedded device, on the edge, and tested in real-time.


Combining Machine Learning with Computational Fluid Dynamics using OpenFOAM and SmartSim

arXiv.org Artificial Intelligence

Combining machine learning (ML) with computational fluid dynamics (CFD) opens many possibilities for improving simulations of technical and natural systems. However, CFD+ML algorithms require exchange of data, synchronization, and calculation on heterogeneous hardware, making their implementation for large-scale problems exceptionally challenging. We provide an effective and scalable solution to developing CFD+ML algorithms using open source software OpenFOAM and SmartSim. SmartSim provides an Orchestrator that significantly simplifies the programming of CFD+ML algorithms and a Redis database that ensures highly scalable data exchange between ML and CFD clients. We show how to leverage SmartSim to effectively couple different segments of OpenFOAM with ML, including pre/post-processing applications, solvers, function objects, and mesh motion solvers. We additionally provide an OpenFOAM sub-module with examples that can be used as starting points for real-world applications in CFD+ML.


A review of deep learning-based information fusion techniques for multimodal medical image classification

arXiv.org Artificial Intelligence

Multimodal medical imaging plays a pivotal role in clinical diagnosis and research, as it combines information from various imaging modalities to provide a more comprehensive understanding of the underlying pathology. Recently, deep learning-based multimodal fusion techniques have emerged as powerful tools for improving medical image classification. This review offers a thorough analysis of the developments in deep learning-based multimodal fusion for medical classification tasks. We explore the complementary relationships among prevalent clinical modalities and outline three main fusion schemes for multimodal classification networks: input fusion, intermediate fusion (encompassing single-level fusion, hierarchical fusion, and attention-based fusion), and output fusion. By evaluating the performance of these fusion techniques, we provide insight into the suitability of different network architectures for various multimodal fusion scenarios and application domains. Furthermore, we delve into challenges related to network architecture selection, handling incomplete multimodal data management, and the potential limitations of multimodal fusion. Finally, we spotlight the promising future of Transformer-based multimodal fusion techniques and give recommendations for future research in this rapidly evolving field.


Leveraging Speech for Gesture Detection in Multimodal Communication

arXiv.org Artificial Intelligence

Gestures are inherent to human interaction and often complement speech in face-to-face communication, forming a multimodal communication system. An important task in gesture analysis is detecting a gesture's beginning and end. Research on automatic gesture detection has primarily focused on visual and kinematic information to detect a limited set of isolated or silent gestures with low variability, neglecting the integration of speech and vision signals to detect gestures that co-occur with speech. This work addresses this gap by focusing on co-speech gesture detection, emphasising the synchrony between speech and co-speech hand gestures. We address three main challenges: the variability of gesture forms, the temporal misalignment between gesture and speech onsets, and differences in sampling rate between modalities. We investigate extended speech time windows and employ separate backbone models for each modality to address the temporal misalignment and sampling rate differences. We utilize Transformer encoders in cross-modal and early fusion techniques to effectively align and integrate speech and skeletal sequences. The study results show that combining visual and speech information significantly enhances gesture detection performance. Our findings indicate that expanding the speech buffer beyond visual time segments improves performance and that multimodal integration using cross-modal and early fusion techniques outperforms baseline methods using unimodal and late fusion methods. Additionally, we find a correlation between the models' gesture prediction confidence and low-level speech frequency features potentially associated with gestures. Overall, the study provides a better understanding and detection methods for co-speech gestures, facilitating the analysis of multimodal communication.


Neuro-Inspired Information-Theoretic Hierarchical Perception for Multimodal Learning

arXiv.org Artificial Intelligence

Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world in autonomous systems and cyber-physical systems. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Different from most traditional fusion models that incorporate all modalities identically in neural networks, our model designates a prime modality and regards the remaining modalities as detectors in the information pathway, serving to distill the flow of information. Our proposed perception model focuses on constructing an effective and compact information flow by achieving a balance between the minimization of mutual information between the latent state and the input modal state, and the maximization of mutual information between the latent states and the remaining modal states. This approach leads to compact latent state representations that retain relevant information while minimizing redundancy, thereby substantially enhancing the performance of multimodal representation learning. Experimental evaluations on the MUStARD, CMU-MOSI, and CMU-MOSEI datasets demonstrate that our model consistently distills crucial information in multimodal learning scenarios, outperforming state-of-the-art benchmarks. Remarkably, on the CMU-MOSI dataset, ITHP surpasses human-level performance in the multimodal sentiment binary classification task across all evaluation metrics (i.e., Binary Accuracy, F1 Score, Mean Absolute Error, and Pearson Correlation).


A multi-robot system for the detection of explosive devices

arXiv.org Artificial Intelligence

In order to clear the world of the threat posed by landmines and other explosive devices, robotic systems can play an important role. However, the development of such field robots that need to operate in hazardous conditions requires the careful consideration of multiple aspects related to the perception, mobility, and collaboration capabilities of the system. In the framework of a European challenge, the Artificial Intelligence for Detection of Explosive Devices - eXtended (AIDEDeX) project proposes to design a heterogeneous multi-robot system with advanced sensor fusion algorithms. This system is specifically designed to detect and classify improvised explosive devices, explosive ordnances, and landmines. This project integrates specialised sensors, including electromagnetic induction, ground penetrating radar, X-Ray backscatter imaging, Raman spectrometers, and multimodal cameras, to achieve comprehensive threat identification and localisation. The proposed system comprises a fleet of unmanned ground vehicles and unmanned aerial vehicles. This article details the operational phases of the AIDEDeX system, from rapid terrain exploration using unmanned aerial vehicles to specialised detection and classification by unmanned ground vehicles equipped with a robotic manipulator. Initially focusing on a centralised approach, the project will also explore the potential of a decentralised control architecture, taking inspiration from swarm robotics to provide a robust, adaptable, and scalable solution for explosive detection.


Multifidelity Surrogate Models: A New Data Fusion Perspective

arXiv.org Artificial Intelligence

Multifidelity surrogate modelling combines data of varying accuracy and cost from different sources. It strategically uses low-fidelity models for rapid evaluations, saving computational resources, and high-fidelity models for detailed refinement. It improves decision-making by addressing uncertainties and surpassing the limits of single-fidelity models, which either oversimplify or are computationally intensive. Blending high-fidelity data for detailed responses with frequent low-fidelity data for quick approximations facilitates design optimisation in various domains. Despite progress in interpolation, regression, enhanced sampling, error estimation, variable fidelity, and data fusion techniques, challenges persist in selecting fidelity levels and developing efficient data fusion methods. This study proposes a new fusion approach to construct multi-fidelity surrogate models by constructing gradient-only surrogates that use only gradients to construct regression surfaces. Results are demonstrated on foundational example problems that isolate and illustrate the fusion approach's efficacy, avoiding the need for complex examples that obfuscate the main concept.