Information Fusion
Information Fusion for Assistance Systems in Production Assessment
Arévalo, Fernando, Piolo, Christian Alison M., Ibrahim, M. Tahasanul, Schwung, Andreas
We propose a novel methodology to define assistance systems that rely on information fusion to combine different sources of information while providing an assessment. The main contribution of this paper is providing a general framework for the fusion of n number of information sources using the evidence theory. The fusion provides a more robust prediction and an associated uncertainty that can be used to assess the prediction likeliness. Moreover, we provide a methodology for the information fusion of two primary sources: an ensemble classifier based on machine data and an expert-centered model. We demonstrate the information fusion approach using data from an industrial setup, which rounds up the application part of this research. Furthermore, we address the problem of data drift by proposing a methodology to update the data-based models using an evidence theory approach. We validate the approach using the Benchmark Tennessee Eastman while doing an ablation study of the model update parameters.
Transformer-based interpretable multi-modal data fusion for skin lesion classification
Cheslerean-Boghiu, Theodor, Fleischmann, Melia-Evelina, Willem, Theresa, Lasser, Tobias
A lot of deep learning (DL) research these days is mainly focused on improving quantitative metrics regardless of other factors. In human-centered applications, like skin lesion classification in dermatology, DL-driven clinical decision support systems are still in their infancy due to the limited transparency of their decision-making process. Moreover, the lack of procedures that can explain the behavior of trained DL algorithms leads to almost no trust from clinical physicians. To diagnose skin lesions, dermatologists rely on visual assessment of the disease and the data gathered from the patient's anamnesis. Data-driven algorithms dealing with multi-modal data are limited by the separation of feature-level and decision-level fusion procedures required by convolutional architectures. To address this issue, we enable single-stage multi-modal data fusion via the attention mechanism of transformer-based architectures to aid in diagnosing skin diseases. Our method beats other state-of-the-art single- and multi-modal DL architectures in image-rich and patient-data-rich environments. Additionally, the choice of the architecture enables native interpretability support for the classification task both in the image and metadata domain with no additional modifications necessary.
Towards Versatile and Efficient Visual Knowledge Integration into Pre-trained Language Models with Cross-Modal Adapters
Zhang, Xinyun, Tan, Haochen, Wu, Han, Zhan, Mingjie, Liang, Ding, Yu, Bei
Humans learn language via multi-modal knowledge. However, due to the text-only pre-training scheme, most existing pre-trained language models (PLMs) are hindered from the multi-modal information. To inject visual knowledge into PLMs, existing methods incorporate either the text or image encoder of vision-language models (VLMs) to encode the visual information and update all the original parameters of PLMs for knowledge fusion. In this paper, we propose a new plug-and-play module, X-adapter, to flexibly leverage the aligned visual and textual knowledge learned in pre-trained VLMs and efficiently inject them into PLMs. Specifically, we insert X-adapters into PLMs, and only the added parameters are updated during adaptation. To fully exploit the potential in VLMs, X-adapters consist of two sub-modules, V-expert and T-expert, to fuse VLMs' image and text representations, respectively. We can opt for activating different sub-modules depending on the downstream tasks. Experimental results show that our method can significantly improve the performance on object-color reasoning and natural language understanding (NLU) tasks compared with PLM baselines.
Extrinsic Calibration of 2D Millimetre-Wavelength Radar Pairs Using Ego-Velocity Estimates
Cheng, Qilong, Wise, Emmett, Kelly, Jonathan
Correct radar data fusion depends on knowledge of the spatial transform between sensor pairs. Current methods for determining this transform operate by aligning identifiable features in different radar scans, or by relying on measurements from another, more accurate sensor. Feature-based alignment requires the sensors to have overlapping fields of view or necessitates the construction of an environment map. Several existing techniques require bespoke retroreflective radar targets. These requirements limit both where and how calibration can be performed. In this paper, we take a different approach: instead of attempting to track targets or features, we rely on ego-velocity estimates from each radar to perform calibration. Our method enables calibration of a subset of the transform parameters, including the yaw and the axis of translation between the radar pair, without the need for a shared field of view or for specialized targets. In general, the yaw and the axis of translation are the most important parameters for data fusion, the most likely to vary over time, and the most difficult to calibrate manually. We formulate calibration as a batch optimization problem, show that the radar-radar system is identifiable, and specify the platform excitation requirements. Through simulation studies and real-world experiments, we establish that our method is more reliable and accurate than state-of-the-art methods. Finally, we demonstrate that the full rigid body transform can be recovered if relatively coarse information about the platform rotation rate is available.
A Medical Image Fusion Method based on MDLatLRRv2
Song, Xu, Wu, Xiao-Jun, Li, Hui
Since MDLatLRR only considers detailed parts (salient features) of input images extracted by latent low-rank representation (LatLRR), it doesn't use base parts (principal features) extracted by LatLRR effectively. Therefore, we proposed an improved multi-level decomposition method called MDLatLRRv2 which effectively analyzes and utilizes all the image features obtained by LatLRR. Then we apply MDLatLRRv2 to medical image fusion. The base parts are fused by average strategy and the detail parts are fused by nuclear-norm operation. The comparison with the existing methods demonstrates that the proposed method can achieve state-of-the-art fusion performance in objective and subjective assessment.
Leveraging Knowledge and Reinforcement Learning for Enhanced Reliability of Language Models
Tyagi, Nancy, Sarkar, Surjodeep, Gaur, Manas
The Natural Language Processing(NLP) community has been using crowd sourcing techniques to create benchmark datasets such as General Language Understanding and Evaluation(GLUE) for training modern Language Models such as BERT. GLUE tasks measure the reliability scores using inter annotator metrics i.e. Cohens Kappa. However, the reliability aspect of LMs has often been overlooked. To counter this problem, we explore a knowledge-guided LM ensembling approach that leverages reinforcement learning to integrate knowledge from ConceptNet and Wikipedia as knowledge graph embeddings. This approach mimics human annotators resorting to external knowledge to compensate for information deficits in the datasets. Across nine GLUE datasets, our research shows that ensembling strengthens reliability and accuracy scores, outperforming state of the art.
UWB Ranging and IMU Data Fusion: Overview and Nonlinear Stochastic Filter for Inertial Navigation
Hashim, Hashim A., Eltoukhy, Abdelrahman E. E., Vamvoudakis, Kyriakos G.
This paper proposes a nonlinear stochastic complementary filter design for inertial navigation that takes advantage of a fusion of Ultra-wideband (UWB) and Inertial Measurement Unit (IMU) technology ensuring semi-global uniform ultimate boundedness (SGUUB) of the closed loop error signals in mean square. The proposed filter estimates the vehicle's orientation, position, linear velocity, and noise covariance. The filter is designed to mimic the nonlinear navigation motion kinematics and is posed on a matrix Lie Group, the extended form of the Special Euclidean Group $\mathbb{SE}_{2}\left(3\right)$. The Lie Group based structure of the proposed filter provides unique and global representation avoiding singularity (a common shortcoming of Euler angles) as well as non-uniqueness (a common limitation of unit-quaternion). Unlike Kalman-type filters, the proposed filter successfully addresses IMU measurement noise considering unknown upper-bounded covariance. Although the navigation estimator is proposed in a continuous form, the discrete version is also presented. Moreover, the unit-quaternion implementation has been provided in the Appendix. Experimental validation performed using a publicly available real-world six-degrees-of-freedom (6 DoF) flight dataset obtained from an unmanned Micro Aerial Vehicle (MAV) illustrating the robustness of the proposed navigation technique. Keywords: Sensor-fusion, Inertial navigation, Ultra-wideband ranging, Inertial measurement unit, Stochastic differential equation, Stability, Localization, Observer design.
Radar-Camera Fusion for Object Detection and Semantic Segmentation in Autonomous Driving: A Comprehensive Review
Yao, Shanliang, Guan, Runwei, Huang, Xiaoyu, Li, Zhuoxiao, Sha, Xiangyu, Yue, Yong, Lim, Eng Gee, Seo, Hyungjoon, Man, Ka Lok, Zhu, Xiaohui, Yue, Yutao
Driven by deep learning techniques, perception technology in autonomous driving has developed rapidly in recent years, enabling vehicles to accurately detect and interpret surrounding environment for safe and efficient navigation. To achieve accurate and robust perception capabilities, autonomous vehicles are often equipped with multiple sensors, making sensor fusion a crucial part of the perception system. Among these fused sensors, radars and cameras enable a complementary and cost-effective perception of the surrounding environment regardless of lighting and weather conditions. This review aims to provide a comprehensive guideline for radar-camera fusion, particularly concentrating on perception tasks related to object detection and semantic segmentation.Based on the principles of the radar and camera sensors, we delve into the data processing process and representations, followed by an in-depth analysis and summary of radar-camera fusion datasets. In the review of methodologies in radar-camera fusion, we address interrogative questions, including "why to fuse", "what to fuse", "where to fuse", "when to fuse", and "how to fuse", subsequently discussing various challenges and potential research directions within this domain. To ease the retrieval and comparison of datasets and fusion methods, we also provide an interactive website: https://radar-camera-fusion.github.io.
Path-Constrained State Estimation for Rail Vehicles
von Einem, Cornelius, Cramariuc, Andrei, Siegwart, Roland, Cadena, Cesar, Tschopp, Florian
Globally rising demand for transportation by rail is pushing existing infrastructure to its capacity limits, necessitating the development of accurate, robust, and high-frequency positioning systems to ensure safe and efficient train operation. As individual sensor modalities cannot satisfy the strict requirements of robustness and safety, a combination thereof is required. We propose a path-constrained sensor fusion framework to integrate various modalities while leveraging the unique characteristics of the railway network. To reflect the constrained motion of rail vehicles along their tracks, the state is modeled in 1D along the track geometry. We further leverage the limited action space of a train by employing a novel multi-hypothesis tracking to account for multiple possible trajectories a vehicle can take through the railway network. We demonstrate the reliability and accuracy of our fusion framework on multiple tram datasets recorded in the city of Zurich, utilizing Visual-Inertial Odometry for local motion estimation and a standard GNSS for global localization. We evaluate our results using ground truth localizations recorded with a RTK-GNSS, and compare our method to standard baselines. A Root Mean Square Error of 4.78 m and a track selectivity score of up to 94.9 % have been achieved.
Data Fusion in Neuromarketing: Multimodal Analysis of Biosignals, Lifecycle Stages, Current Advances, Datasets, Trends, and Challenges
Pérez, Mario Quiles, Beltrán, Enrique Tomás Martínez, Bernal, Sergio López, Prat, Eduardo Horna, Del Campo, Luis Montesano, Maimó, Lorenzo Fernández, Celdrán, Alberto Huertas
The primary goal of any company is to increase its profits by improving both the quality of its products and how they are advertised. In this context, neuromarketing seeks to enhance the promotion of products and generate a greater acceptance on potential buyers. Traditionally, neuromarketing studies have relied on a single biosignal to obtain feedback from presented stimuli. However, thanks to new devices and technological advances studying this area of knowledge, recent trends indicate a shift towards the fusion of diverse biosignals. An example is the usage of electroencephalography for understanding the impact of an advertisement at the neural level and visual tracking to identify the stimuli that induce such impacts. This emerging pattern determines which biosignals to employ for achieving specific neuromarketing objectives. Furthermore, the fusion of data from multiple sources demands advanced processing methodologies. Despite these complexities, there is a lack of literature that adequately collates and organizes the various data sources and the applied processing techniques for the research objectives pursued. To address these challenges, the current paper conducts a comprehensive analysis of the objectives, biosignals, and data processing techniques employed in neuromarketing research. This study provides both the technical definition and a graphical distribution of the elements under revision. Additionally, it presents a categorization based on research objectives and provides an overview of the combinatory methodologies employed. After this, the paper examines primary public datasets designed for neuromarketing research together with others whose main purpose is not neuromarketing, but can be used for this matter. Ultimately, this work provides a historical perspective on the evolution of techniques across various phases over recent years and enumerates key lessons learned.