Information Fusion
Diagnosing Alzheimer's Disease using Early-Late Multimodal Data Fusion with Jacobian Maps
Alzheimer's disease (AD) is a prevalent and debilitating neurodegenerative disorder impacting a large aging population. Detecting AD in all its presymptomatic and symptomatic stages is crucial for early intervention and treatment. An active research direction is to explore machine learning methods that harness multimodal data fusion to outperform human inspection of medical scans. However, existing multimodal fusion models have limitations, including redundant computation, complex architecture, and simplistic handling of missing data. Moreover, the preprocessing pipelines of medical scans remain inadequately detailed and are seldom optimized for individual subjects. In this paper, we propose an efficient early-late fusion (ELF) approach, which leverages a convolutional neural network for automated feature extraction and random forests for their competitive performance on small datasets. Additionally, we introduce a robust preprocessing pipeline that adapts to the unique characteristics of individual subjects and makes use of whole brain images rather than slices or patches. Moreover, to tackle the challenge of detecting subtle changes in brain volume, we transform images into the Jacobian domain (JD) to enhance both accuracy and robustness in our classification. Using MRI and CT images from the OASIS-3 dataset, our experiments demonstrate the effectiveness of the ELF approach in classifying AD into four stages with an accuracy of 97.19%.
CoFiI2P: Coarse-to-Fine Correspondences for Image-to-Point Cloud Registration
Kang, Shuhao, Liao, Youqi, Li, Jianping, Liang, Fuxun, Li, Yuhao, Li, Fangning, Dong, Zhen, Yang, Bisheng
Image-to-point cloud (I2P) registration is a fundamental task in the field of autonomous vehicles and transportation systems for cross-modality data fusion and localization. Existing I2P registration methods estimate correspondences at the point/pixel level, often overlooking global alignment. However, I2P matching can easily converge to a local optimum when performed without high-level guidance from global constraints. To address this issue, this paper introduces CoFiI2P, a novel I2P registration network that extracts correspondences in a coarse-to-fine manner to achieve the globally optimal solution. First, the image and point cloud data are processed through a Siamese encoder-decoder network for hierarchical feature extraction. Second, a coarse-to-fine matching module is designed to leverage these features and establish robust feature correspondences. Specifically, In the coarse matching phase, a novel I2P transformer module is employed to capture both homogeneous and heterogeneous global information from the image and point cloud data. This enables the estimation of coarse super-point/super-pixel matching pairs with discriminative descriptors. In the fine matching module, point/pixel pairs are established with the guidance of super-point/super-pixel correspondences. Finally, based on matching pairs, the transform matrix is estimated with the EPnP-RANSAC algorithm. Extensive experiments conducted on the KITTI dataset demonstrate that CoFiI2P achieves impressive results, with a relative rotation error (RRE) of 1.14 degrees and a relative translation error (RTE) of 0.29 meters. These results represent a significant improvement of 84\% in RRE and 89\% in RTE compared to the current state-of-the-art (SOTA) method. Qualitative results are available at https://youtu.be/ovbedasXuZE. The source code will be publicly released at https://github.com/kang-1-2-3/CoFiI2P.
Efficient Data Fusion using the Tsetlin Machine
Saha, Rupsa, Zadorozhny, Vladimir I., Granmo, Ole-Christoffer
We propose a novel way of assessing and fusing noisy dynamic data using a Tsetlin Machine. Our approach consists in monitoring how explanations in form of logical clauses that a TM learns changes with possible noise in dynamic data. This way TM can recognize the noise by lowering weights of previously learned clauses, or reflect it in the form of new clauses. We also perform a comprehensive experimental study using notably different datasets that demonstrated high performance of the proposed approach.
FGO-ILNS: Tightly Coupled Multi-Sensor Integrated Navigation System Based on Factor Graph Optimization for Autonomous Underwater Vehicle
Song, Jiangbo, Li, Wanqing, Liu, Ruofan, Zhu, Xiangwei
Multi-sensor fusion is an effective way to enhance the positioning performance of autonomous underwater vehicles (AUVs). However, underwater multi-sensor fusion faces challenges such as heterogeneous frequency and dynamic availability of sensors. Traditional filter-based algorithms suffer from low accuracy and robustness when sensors become unavailable. The factor graph optimization (FGO) can enable multi-sensor plug-and-play despite data frequency. Therefore, we present an FGO-based strapdown inertial navigation system (SINS) and long baseline location (LBL) system tightly coupled navigation system (FGO-ILNS). Sensors such as Doppler velocity log (DVL), magnetic compass pilot (MCP), pressure sensor (PS), and global navigation satellite system (GNSS) can be tightly coupled with FGO-ILNS to satisfy different navigation scenarios. In this system, we propose a floating LBL slant range difference factor model tightly coupled with IMU preintegration factor to achieve unification of global position above and below water. Furthermore, to address the issue of sensor measurements not being synchronized with the LBL during fusion, we employ forward-backward IMU preintegration to construct sensor factors such as GNSS and DVL. Moreover, we utilize the marginalization method to reduce the computational load of factor graph optimization. Simulation and public KAIST dataset experiments have verified that, compared to filter-based algorithms like the extended Kalman filter and federal Kalman filter, as well as the state-of-the-art optimization-based algorithm ORB-SLAM3, our proposed FGO-ILNS leads in accuracy and robustness.
MMTF-DES: A Fusion of Multimodal Transformer Models for Desire, Emotion, and Sentiment Analysis of Social Media Data
Aziz, Abdul, Chowdhury, Nihad Karim, Kabir, Muhammad Ashad, Chy, Abu Nowshed, Siddique, Md. Jawad
Desire is a set of human aspirations and wishes that comprise verbal and cognitive aspects that drive human feelings and behaviors, distinguishing humans from other animals. Understanding human desire has the potential to be one of the most fascinating and challenging research domains. It is tightly coupled with sentiment analysis and emotion recognition tasks. It is beneficial for increasing human-computer interactions, recognizing human emotional intelligence, understanding interpersonal relationships, and making decisions. However, understanding human desire is challenging and under-explored because ways of eliciting desire might be different among humans. The task gets more difficult due to the diverse cultures, countries, and languages. Prior studies overlooked the use of image-text pairwise feature representation, which is crucial for the task of human desire understanding. In this research, we have proposed a unified multimodal transformer-based framework with image-text pair settings to identify human desire, sentiment, and emotion. The core of our proposed method lies in the encoder module, which is built using two state-of-the-art multimodal transformer models. These models allow us to extract diverse features. To effectively extract visual and contextualized embedding features from social media image and text pairs, we conducted joint fine-tuning of two pre-trained multimodal transformer models: Vision-and-Language Transformer (ViLT) and Vision-and-Augmented-Language Transformer (VAuLT). Subsequently, we use an early fusion strategy on these embedding features to obtain combined diverse feature representations of the image-text pair. This consolidation incorporates diverse information about this task, enabling us to robustly perceive the context and image pair from multiple perspectives.
Revisiting multi-GNSS Navigation for UAVs -- An Equivariant Filtering Approach
Scheiber, Martin, Fornasier, Alessandro, Brommer, Christian, Weiss, Stephan
In this work, we explore the recent advances in equivariant filtering for inertial navigation systems to improve state estimation for uncrewed aerial vehicles (UAVs). Traditional state-of-the-art estimation methods, e.g., the multiplicative Kalman filter (MEKF), have some limitations concerning their consistency, errors in the initial state estimate, and convergence performance. Symmetry-based methods, such as the equivariant filter (EqF), offer significant advantages for these points by exploiting the mathematical properties of the system - its symmetry. These filters yield faster convergence rates and robustness to wrong initial state estimates through their error definition. To demonstrate the usability of EqFs, we focus on the sensor-fusion problem with the most common sensors in outdoor robotics: global navigation satellite system (GNSS) sensors and an inertial measurement unit (IMU). We provide an implementation of such an EqF leveraging the semi-direct product of the symmetry group to derive the filter equations. To validate the practical usability of EqFs in real-world scenarios, we evaluate our method using data from all outdoor runs of the INSANE Dataset. Our results demonstrate the performance improvements of the EqF in real-world environments, highlighting its potential for enhancing state estimation for UAVs.
A Machine Learning Data Fusion Model for Soil Moisture Retrieval
Batchu, Vishal, Nearing, Grey, Gulshan, Varun
Soil moisture is one of the primary hydrological state (memory) variables in terrestrial systems (Dobriyal et al. 2012; Rossato et al. 2017a), and is one of the primary controls for agriculture and water management (Dobriyal et al. 2012; Rossato et al. 2017b). Soil moisture affects evapotranspiration and vegetation water availability, which are at the core of the climate-carbon cycle (Falloon et al. 2011) and play an important role in hydrological risks such as floods, drought, erosion, and landslides (Kim et al. 2019; Legates et al. 2011; Tramblay et al. 2012). Accurate measurement of soil moisture has numerous downstream benefits (Moran et al. 2015) including reduced water wastage by better understanding and managing the consumption of water (Brocca et al. 2018; Foster, Mieno, and Brozović 2020), utilising smarter irrigation methods (Kumar et al. 2014) and effective canal water management (Zafar, Prathapar, and Bastiaanssen 2021). The most accurate way to measure soil moisture is via ground-based methods such as direct gravimetric measurements (Klute 1986) or indirect methods such as dielectric reflectometry, capacitance charge, etc. (Bittelli 2011), which in-situ sensors utilize (Walker, Willgoose, and Kalma 2004). However, in-situ sensors are difficult to scale spatially, and are expensive to install and maintain.
Universal Multi-modal Entity Alignment via Iteratively Fusing Modality Similarity Paths
Zhu, Bolin, Liu, Xiaoze, Mao, Xin, Chen, Zhuo, Guo, Lingbing, Gui, Tao, Zhang, Qi
The objective of Entity Alignment (EA) is to identify equivalent entity pairs from multiple Knowledge Graphs (KGs) and create a more comprehensive and unified KG. The majority of EA methods have primarily focused on the structural modality of KGs, lacking exploration of multi-modal information. A few multi-modal EA methods have made good attempts in this field. Still, they have two shortcomings: (1) inconsistent and inefficient modality modeling that designs complex and distinct models for each modality; (2) ineffective modality fusion due to the heterogeneous nature of modalities in EA. To tackle these challenges, we propose PathFusion, consisting of two main components: (1) MSP, a unified modeling approach that simplifies the alignment process by constructing paths connecting entities and modality nodes to represent multiple modalities; (2) IRF, an iterative fusion method that effectively combines information from different modalities using the path as an information carrier. Experimental results on real-world datasets demonstrate the superiority of PathFusion over state-of-the-art methods, with 22.4%-28.9% absolute improvement on Hits@1, and 0.194-0.245 absolute improvement on MRR.
Multi-View Class Incremental Learning
Li, Depeng, Wang, Tianqi, Chen, Junwei, Kawaguchi, Kenji, Lian, Cheng, Zeng, Zhigang
Multi-view learning (MVL) has gained great success in integrating information from multiple perspectives of a dataset to improve downstream task performance. To make MVL methods more practical in an open-ended environment, this paper investigates a novel paradigm called multi-view class incremental learning (MVCIL), where a single model incrementally classifies new classes from a continual stream of views, requiring no access to earlier views of data. However, MVCIL is challenged by the catastrophic forgetting of old information and the interference with learning new concepts. To address this, we first develop a randomization-based representation learning technique serving for feature extraction to guarantee their separate view-optimal working states, during which multiple views belonging to a class are presented sequentially; Then, we integrate them one by one in the orthogonality fusion subspace spanned by the extracted features; Finally, we introduce selective weight consolidation for learning-without-forgetting decision-making while encountering new classes. Extensive experiments on synthetic and real-world datasets validate the effectiveness of our approach.
Multimodal Active Measurement for Human Mesh Recovery in Close Proximity
Maeda, Takahiro, Takeshita, Keisuke, Tanaka, Kazuhito
For safe and sophisticated physical human-robot interactions (pHRI), a robot needs to estimate the accurate body pose or mesh of the target person. However, in these pHRI scenarios, the robot cannot fully observe the target person's body with equipped cameras because the target person is usually close to the robot. This leads to severe truncation and occlusions, and results in poor accuracy of human pose estimation. For better accuracy of human pose estimation or mesh recovery on this limited information from cameras, we propose an active measurement and sensor fusion framework of the equipped cameras and other sensors such as touch sensors and 2D LiDAR. These touch and LiDAR sensing are obtained attendantly through pHRI without additional costs. These sensor measurements are sparse but reliable and informative cues for human mesh recovery. In our active measurement process, camera viewpoints and sensor placements are optimized based on the uncertainty of the estimated pose, which is closely related to the truncated or occluded areas. In our sensor fusion process, we fuse the sensor measurements to the camera-based estimated pose by minimizing the distance between the estimated mesh and measured positions. Our method is agnostic to robot configurations. Experiments were conducted using the Toyota Human Support Robot, which has a camera, 2D LiDAR, and a touch sensor on the robot arm. Our proposed method demonstrated the superiority in the human pose estimation accuracy on the quantitative comparison. Furthermore, our proposed method reliably estimated the pose of the target person in practical settings such as target people occluded by a blanket and standing aid with the robot arm.