Goto

Collaborating Authors

 Information Fusion


Conformal Trajectory Prediction with Multi-View Data Integration in Cooperative Driving

arXiv.org Artificial Intelligence

Current research on trajectory prediction primarily relies on data collected by onboard sensors of an ego vehicle. With the rapid advancement in connected technologies, such as vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communication, valuable information from alternate views becomes accessible via wireless networks. The integration of information from alternative views has the potential to overcome the inherent limitations associated with a single viewpoint, such as occlusions and limited field of view. In this work, we introduce V2INet, a novel trajectory prediction framework designed to model multi-view data by extending existing single-view models. Unlike previous approaches where the multi-view data is manually fused or formulated as a separate training stage, our model supports end-to-end training, enhancing both flexibility and performance. Moreover, the predicted multimodal trajectories are calibrated by a post-hoc conformal prediction module to get valid and efficient confidence regions. We evaluated the entire framework using the real-world V2I dataset V2X-Seq. Our results demonstrate superior performance in terms of Final Displacement Error (FDE) and Miss Rate (MR) using a single GPU. The code is publicly available at: \url{https://github.com/xichennn/V2I_trajectory_prediction}.


Identifying the Hierarchical Emotional Areas in the Human Brain Through Information Fusion

arXiv.org Artificial Intelligence

The brain basis of emotion has consistently received widespread attention, attracting a large number of studies to explore this cutting-edge topic. However, the methods employed in these studies typically only model the pairwise relationship between two brain regions, while neglecting the interactions and information fusion among multiple brain regions$\unicode{x2014}$one of the key ideas of the psychological constructionist hypothesis. To overcome the limitations of traditional methods, this study provides an in-depth theoretical analysis of how to maximize interactions and information fusion among brain regions. Building on the results of this analysis, we propose to identify the hierarchical emotional areas in the human brain through multi-source information fusion and graph machine learning methods. Comprehensive experiments reveal that the identified hierarchical emotional areas, from lower to higher levels, primarily facilitate the fundamental process of emotion perception, the construction of basic psychological operations, and the coordination and integration of these operations. Overall, our findings provide unique insights into the brain mechanisms underlying specific emotions based on the psychological constructionist hypothesis.


Distances Between Partial Preference Orderings

arXiv.org Artificial Intelligence

This paper proposes to establish the distance between partial preference orderings based on two very different approaches. The first approach corresponds to the brute force method based on combinatorics. It generates all possible complete preference orderings compatible with the partial preference orderings and calculates the Frobenius distance between all fully compatible preference orderings. Unfortunately, this first method is not very efficient in solving high-dimensional problems because of its big combinatorial complexity. That is why we propose to circumvent this problem by using a second approach based on belief functions, which can adequately model the missing information of partial preference orderings. This second approach to the calculation of distance does not suffer from combinatorial complexity limitation. We show through simple examples how these two theoretical methods work.


Appformer: A Novel Framework for Mobile App Usage Prediction Leveraging Progressive Multi-Modal Data Fusion and Feature Extraction

arXiv.org Artificial Intelligence

This article presents Appformer, a novel mobile application prediction framework inspired by the efficiency of Transformer-like architectures in processing sequential data through self-attention mechanisms. Combining a Multi-Modal Data Progressive Fusion Module with a sophisticated Feature Extraction Module, Appformer leverages the synergies of multi-modal data fusion and data mining techniques while maintaining user privacy. The framework employs Points of Interest (POIs) associated with base stations, optimizing them through comprehensive comparative experiments to identify the most effective clustering method. These refined inputs are seamlessly integrated into the initial phases of cross-modal data fusion, where temporal units are encoded via word embeddings and subsequently merged in later stages. The Feature Extraction Module, employing Transformer-like architectures specialized for time series analysis, adeptly distils comprehensive features. It meticulously fine-tunes the outputs from the fusion module, facilitating the extraction of high-calibre, multi-modal features, thus guaranteeing a robust and efficient extraction process. Extensive experimental validation confirms Appformer's effectiveness, attaining state-of-the-art (SOTA) metrics in mobile app usage prediction, thereby signifying a notable progression in this field.


IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment

arXiv.org Artificial Intelligence

Multi-modal entity alignment (MMEA) aims to identify equivalent entities between multi-modal knowledge graphs (MMKGs), where the entities can be associated with related images. Most existing studies integrate multi-modal information heavily relying on the automatically-learned fusion module, rarely suppressing the redundant information for MMEA explicitly. To this end, we explore variational information bottleneck for multi-modal entity alignment (IBMEA), which emphasizes the alignment-relevant information and suppresses the alignment-irrelevant information in generating entity representations. Specifically, we devise multi-modal variational encoders to generate modal-specific entity representations as probability distributions. Then, we propose four modal-specific information bottleneck regularizers, limiting the misleading clues in refining modal-specific entity representations. Finally, we propose a modal-hybrid information contrastive regularizer to integrate all the refined modal-specific representations, enhancing the entity similarity between MMKGs to achieve MMEA. We conduct extensive experiments on two cross-KG and three bilingual MMEA datasets. Experimental results demonstrate that our model consistently outperforms previous state-of-the-art methods, and also shows promising and robust performance in low-resource and high-noise data scenarios.


Multimodal Machine Learning in Mental Health: A Survey of Data, Algorithms, and Challenges

arXiv.org Artificial Intelligence

The application of machine learning (ML) in detecting, diagnosing, and treating mental health disorders is garnering increasi ng attention. Traditionally, research has focused on single modalities, such as text from clinical notes, audio from speech samples, or video of interaction patterns. Recently, multimodal ML, which combines information from multiple modalities, has demonstrated significant promise in offering novel insights into human behavior patterns and recognizing mental health symptoms and risk factors. Despite its potential, multimodal ML in mental health remains an emerging field, facing several complex challenges before practical applications can be effectively developed. This survey provides a comprehensive overview of the data availability a nd current state-of-the-art multimodal ML applications for mental health. It discusses key challenges that must be addressed to advance the field.


Modality-Order Matters! A Novel Hierarchical Feature Fusion Method for CoSAm: A Code-Switched Autism Corpus

arXiv.org Artificial Intelligence

Autism Spectrum Disorder (ASD) is a complex neuro-developmental challenge, presenting a spectrum of difficulties in social interaction, communication, and the expression of repetitive behaviors in different situations. This increasing prevalence underscores the importance of ASD as a major public health concern and the need for comprehensive research initiatives to advance our understanding of the disorder and its early detection methods. This study introduces a novel hierarchical feature fusion method aimed at enhancing the early detection of ASD in children through the analysis of code-switched speech (English and Hindi). Employing advanced audio processing techniques, the research integrates acoustic, paralinguistic, and linguistic information using Transformer Encoders. This innovative fusion strategy is designed to improve classification robustness and accuracy, crucial for early and precise ASD identification. The methodology involves collecting a code-switched speech corpus, CoSAm, from children diagnosed with ASD and a matched control group. The dataset comprises 61 voice recordings from 30 children diagnosed with ASD and 31 from neurotypical children, aged between 3 and 13 years, resulting in a total of 159.75 minutes of voice recordings. The feature analysis focuses on MFCCs and extensive statistical attributes to capture speech pattern variability and complexity. The best model performance is achieved using a hierarchical fusion technique with an accuracy of 98.75% using a combination of acoustic and linguistic features first, followed by paralinguistic features in a hierarchical manner.


Double-Layer Soft Data Fusion for Indoor Robot WiFi-Visual Localization

arXiv.org Artificial Intelligence

This paper presents a novel WiFi-Visual data fusion method for indoor robot (TIAGO++) localization. This method can use 10 WiFi samples and 4 low-resolution images ($58 \times 58$ in pixels) to localize a indoor robot with an average error distance about 1.32 meters. The experiment test is 3 months after the data collection in a general teaching building, whose WiFi and visual environments are partially changed. This indirectly shows the robustness of the proposed method. Instead of neural network design, this paper focuses on the soft data fusion to prevent unbounded errors in visual localization. A double-layer soft data fusion is proposed. The proposed soft data fusion includes the first-layer WiFi-Visual feature fusion and the second-layer decision vector fusion. Firstly, motivated by the excellent capability of neural network in image processing and recognition, the temporal-spatial features are extracted from WiFi data, these features are represented in image form. Secondly, the WiFi temporal-spatial features in image form and the visual features taken by the robot camera are combined together, and are jointly exploited by a classification neural network to produce a likelihood vector for WiFi-Visual localization. This is called first-layer WiFi-Visual fusion. Similarly, these two types of features can exploited separately by neural networks to produce another two independent likelihood vectors. Thirdly, the three likelihood vectors are fused by Hadamard product and median filtering to produce the final likelihood vector for localization. This called the second-layer decision vector fusion. The proposed soft data fusion does not apply any threshold or prioritize any data source over the other in the fusion process. It never excludes the positions of low probabilities, which can avoid the information loss due to a hard decision. The demo video is provided. The code will be open.


Semantic-Aware Representation of Multi-Modal Data for Data Ingress: A Literature Review

arXiv.org Artificial Intelligence

Machine Learning (ML) is continuously permeating a growing amount of application domains. Generative AI such as Large Language Models (LLMs) also sees broad adoption to process multi-modal data such as text, images, audio, and video. While the trend is to use ever-larger datasets for training, managing this data efficiently has become a significant practical challenge in the industry-double as much data is certainly not double as good. Rather the opposite is important since getting an understanding of the inherent quality and diversity of the underlying data lakes is a growing challenge for application-specific ML as well as for fine-tuning foundation models. Furthermore, information retrieval (IR) from expanding data lakes is complicated by the temporal dimension inherent in time-series data which must be considered to determine its semantic value. This study focuses on the different semantic-aware techniques to extract embeddings from mono-modal, multi-modal, and cross-modal data to enhance IR capabilities in a growing data lake. Articles were collected to summarize information about the state-of-the-art techniques focusing on applications of embedding for three different categories of data modalities.


Fusion LiDAR-Inertial-Encoder data for High-Accuracy SLAM

arXiv.org Artificial Intelligence

In the realm of robotics, achieving simultaneous localization and mapping (SLAM) is paramount for autonomous navigation, especially in challenging environments like texture-less structures. This paper proposed a factor-graph-based model that tightly integrates IMU and encoder sensors to enhance positioning in such environments. The system operates by meticulously evaluating the data from each sensor. Based on these evaluations, weights are dynamically adjusted to prioritize the more reliable source of information at any given moment. The robot's state is initialized using IMU data, while the encoder aids motion estimation in long corridors. Discrepancies between the two states are used to correct IMU drift. The effectiveness of this method is demonstrably validated through experimentation. Compared to Karto SLAM, a widely used SLAM algorithm, this approach achieves an improvement of 26.98% in rotation angle error and 67.68% reduction in position error. These results convincingly demonstrate the method's superior accuracy and robustness in texture-less environments.