Information Fusion
Engineering Artificial Intelligence: Framework, Challenges, and Future Direction
Lee, Jay, Su, Hanqi, Ji, Dai-Yan, Minami, Takanobu
Over the past ten years, the application of artificial intelligence (AI) and machine learning (ML) in engineering domains has gained significant popularity, showcasing their potential in data-driven contexts. However, the complexity and diversity of engineering problems often require the development of domain-specific AI approaches, which are frequently hindered by a lack of systematic methodologies, scalability, and robustness during the development process. To address this gap, this paper introduces the "ABCDE" as the key elements of Engineering AI and proposes a unified, systematic engineering AI ecosystem framework, including eight essential layers, along with attributes, goals, and applications, to guide the development and deployment of AI solutions for specific engineering needs. Additionally, key challenges are examined, and eight future research directions are highlighted. By providing a comprehensive perspective, this paper aims to advance the strategic implementation of AI, fostering the development of next-generation engineering AI solutions.
Disentangling Bias by Modeling Intra- and Inter-modal Causal Attention for Multimodal Sentiment Analysis
Jiang, Menghua, Lin, Yuxia, Chen, Baoliang, Hu, Haifeng, Jiang, Yuncheng, Mai, Sijie
Multimodal sentiment analysis (MSA) aims to understand human emotions by integrating information from multiple modalities, such as text, audio, and visual data. However, existing methods often suffer from spurious correlations both within and across modalities, leading models to rely on statistical shortcuts rather than true causal relationships, thereby undermining generalization. To mitigate this issue, we propose a Multi-relational Multimodal Causal Intervention (MMCI) model, which leverages the backdoor adjustment from causal theory to address the confounding effects of such shortcuts. Specifically, we first model the multimodal inputs as a multi-relational graph to explicitly capture intra- and inter-modal dependencies. Then, we apply an attention mechanism to separately estimate and disentangle the causal features and shortcut features corresponding to these intra- and inter-modal relations. Finally, by applying the backdoor adjustment, we stratify the shortcut features and dynamically combine them with the causal features to encourage MMCI to produce stable predictions under distribution shifts. Extensive experiments on several standard MSA datasets and out-of-distribution (OOD) test sets demonstrate that our method effectively suppresses biases and improves performance.
Multiple Time Series Fusion Based on LSTM An Application to CAP A Phase Classification Using EEG
Mendonรงa, Fรกbio, Mostafa, Sheikh Shanawaz, Freitas, Diogo, Morgado-Dias, Fernando, Ravelo-Garcรญa, Antonio G.
Biomedical decision making involves multiple signal processing, either from different sensors or from different channels. In both cases, information fusion plays a significant role. A deep learning based electroencephalogram channels' feature level fusion is carried out in this work for the electroencephalogram cyclic alternating pattern A phase classification. Channel selection, fusion, and classification procedures were optimized by two optimization algorithms, namely, Genetic Algorithm and Particle Swarm Optimization. The developed methodologies were evaluated by fusing the information from multiple electroencephalogram channels for patients with nocturnal frontal lobe epilepsy and patients without any neurological disorder, which was significantly more challenging when compared to other state of the art works. Results showed that both optimization algorithms selected a comparable structure with similar feature level fusion, consisting of three electroencephalogram channels, which is in line with the CAP protocol to ensure multiple channels' arousals for CAP detection. Moreover, the two optimized models reached an area under the receiver operating characteristic curve of 0.82, with average accuracy ranging from 77% to 79%, a result which is in the upper range of the specialist agreement. The proposed approach is still in the upper range of the best state of the art works despite a difficult dataset, and has the advantage of providing a fully automatic analysis without requiring any manual procedure. Ultimately, the models revealed to be noise resistant and resilient to multiple channel loss.
Stakeholder Perspectives on Humanistic Implementation of Computer Perception in Healthcare: A Qualitative Study
Kostick-Quenet, Kristin M., Hurley, Meghan E., Ayaz, Syed, Herrington, John, Zampella, Casey, Parish-Morris, Julia, Tunรง, Birkan, Lรกzaro-Muรฑoz, Gabriel, Blumenthal-Barby, J. S., Storch, Eric A.
Computer perception (CP) technologies (digital phenotyping, affective computing and related passive sensing approaches) offer unprecedented opportunities to personalize healthcare, but provoke concerns about privacy, bias and the erosion of empathic, relationship-centered practice. A comprehensive understanding of perceived risks, benefits, and implementation challenges from those who design, deploy and experience these tools in real-world settings remains elusive. This study provides the first evidence-based account of key stakeholder perspectives on the relational, technical, and governance challenges raised by the integration of CP technologies into patient care. We conducted in-depth, semi-structured interviews with 102 stakeholders: adolescent patients and their caregivers, frontline clinicians, technology developers, and ethics, legal, policy or philosophy scholars. Transcripts underwent thematic analysis by a multidisciplinary team; reliability was enhanced through double coding and consensus adjudication. Stakeholders articulated seven interlocking concern domains: (1) trustworthiness and data integrity; (2) patient-specific relevance; (3) utility and workflow integration; (4) regulation and governance; (5) privacy and data protection; (6) direct and indirect patient harms; and (7) philosophical critiques of reductionism. To operationalize humanistic safeguards, we propose "personalized roadmaps": co-designed plans that predetermine which metrics will be monitored, how and when feedback is shared, thresholds for clinical action, and procedures for reconciling discrepancies between algorithmic inferences and lived experience. By translating these insights into personalized roadmaps, we offer a practical framework for developers, clinicians and policymakers seeking to harness continuous behavioral data while preserving the humanistic core of care.
Effective Damage Data Generation by Fusing Imagery with Human Knowledge Using Vision-Language Models
Wei, Jie, Ardiles-Cruz, Erika, Panasyuk, Aleksey, Blasch, Erik
It is of crucial importance to assess damages promptly and accurately in humanitarian assistance and disaster response (HADR). Current deep learning approaches struggle to generalize effectively due to the imbalance of data classes, scarcity of moderate damage examples, and human inaccuracy in pixel labeling during HADR situations. To accommodate for these limitations and exploit state-of-the-art techniques in vision-language models (VLMs) to fuse imagery with human knowledge understanding, there is an opportunity to generate a diversified set of image-based damage data effectively. Our initial experimental results suggest encouraging data generation quality, which demonstrates an improvement in classifying scenes with different levels of structural damage to buildings, roads, and infrastructures.
DRKF: Decoupled Representations with Knowledge Fusion for Multimodal Emotion Recognition
Jiang, Peiyuan, Liu, Yao, Liu, Qiao, Zhang, Zongshun, Yang, Jiaye, Liu, Lu, Yao, Daibing
Multimodal emotion recognition (MER) aims to identify emotional states by integrating and analyzing information from multiple modalities. However, inherent modality heterogeneity and inconsistencies in emotional cues remain key challenges that hinder performance. To address these issues, we propose a Decoupled Representations with Knowledge Fusion (DRKF) method for MER. DRKF consists of two main modules: an Optimized Representation Learning (ORL) Module and a Knowledge Fusion (KF) Module. ORL employs a contrastive mutual information estimation method with progressive modality augmentation to decouple task-relevant shared representations and modality-specific features while mitigating modality heterogeneity. KF includes a lightweight self-attention-based Fusion Encoder (FE) that identifies the dominant modality and integrates emotional information from other modalities to enhance the fused representation. To handle potential errors from incorrect dominant modality selection under emotionally inconsistent conditions, we introduce an Emotion Discrimination Submodule (ED), which enforces the fused representation to retain discriminative cues of emotional inconsistency. This ensures that even if the FE selects an inappropriate dominant modality, the Emotion Classification Submodule (EC) can still make accurate predictions by leveraging preserved inconsistency information. Experiments show that DRKF achieves state-of-the-art (SOTA) performance on IEMOCAP, MELD, and M3ED. The source code is publicly available at https://github.com/PANPANKK/DRKF.
Augmented Vision-Language Models: A Systematic Review
Davis, Anthony C, Sadiq, Burhan, Shu, Tianmin, Huang, Chien-Ming
Recent advances in visual-language machine learning models have demonstrated exceptional ability to use natural language and understand visual scenes by training on large, unstructured datasets. However, this training paradigm cannot produce interpretable explanations for its outputs, requires retraining to integrate new information, is highly resource-intensive, and struggles with certain forms of logical reasoning. One promising solution involves integrating neural networks with external symbolic information systems, forming neural symbolic systems that can enhance reasoning and memory abilities. These neural symbolic systems provide more interpretable explanations to their outputs and the capacity to assimilate new information without extensive retraining. Utilizing powerful pre-trained Vision-Language Models (VLMs) as the core neural component, augmented by external systems, offers a pragmatic approach to realizing the benefits of neural-symbolic integration. This systematic literature review aims to categorize techniques through which visual-language understanding can be improved by interacting with external symbolic information systems.
SmartPNT-MSF: A Multi-Sensor Fusion Dataset for Positioning and Navigation Research
Zhu, Feng, Zhang, Zihang, Teng, Kangcheng, Yakup, Abduhelil, Zhang, Xiaohong
-- High - precision navigation and positioning systems are critical for applications in autonomous vehicles and mobile mapping, where robust and continuous localization is essential. To test and enhance the performance of algorithms, some research institutions and companies have successively constructed and publicly released datasets. However, existing datasets still suffer from limitations in sensor diversity and environmental coverage. To address these shortcomings and advance development in related fields, the SmartPNT Multisource Integrated Navigation, Positioning, and Attitude Dataset has been developed. This dataset integrates data from multiple sensors, including Global Navigation Satellite Systems (GNSS), Inertial Measurement Units (IMU), optical cameras, and LiDAR, to provide a rich and versatile resource for research in multi - sensor fusion and high - precision navigation. The dataset construction process is thoroughly documented, encompassing sensor configurations, coordinate system definitions, and calibration procedures for both cameras and LiDAR. A standardized framework for data collection and processing ensures consistency and scalability, enabling large - scale analysis. Validation using state - of - the - art Simultaneous Localization and Mapping (SLAM) algorithms, such as VINS - Mono and LIO - SAM, demonstrates the dataset's applicability for advanced navigation research. Covering a wide range of real - world scenarios, including urban areas, campuses, tunnels, and suburban environments, the dataset offers a valuable tool for advancing navigation technologies and addressing challenges in complex environments. By providing a publicly accessible, high - quality dataset, this work aims to bridge gaps in sensor diversity, data accessibility, and environmental representation, fostering further innovation in the field . I NTRODUCTION h e continuous advancement of positioning and navigation technologies has driven rapid development across various domains. Feng Zhu is with the School of Geodesy and Geomatics, Wuhan University, Wuhan, Hubei 430079, China, and also with the Hubei Luojia Laboratory, Wuhan, Hubei 430079, China (e - mail: fzhu@whu.edu.cn). Zihang Zhang, Kangcheng Teng, and Abduhelil Yakup are with Wuhan University Technology, the School of Geodesy and Geomatics, Wuhan University, Wuhan, Hubei 430079, China (e - mail: zihangzhang@whu.edu.cn;
Experimentally-Driven Analysis of Stability in Connected Vehicle Platooning: Insights and Control Strategies
Dutta, Niladri, Abolfazli, Elham, Charalambous, Themistoklis
-- This paper presents the development of a tangible platform for demonstrating the practical implementation of cooperative adaptive cruise control (CACC) systems, an enhancement to the standard adaptive cruise control (ACC) concept by means of V ehicle-to-Everything (V2X) communication. It involves a detailed examination of existing longitudinal controllers and their performance in homogeneous vehicle platoons. Moreover, extensive tests are conducted using multiple autonomous experimental vehicle platform topologies to verify the effectiveness of the controller . The outcomes from both simulations and field tests affirm the substantial benefits of the proposed CACC platooning approach in longitudinal vehicle platooning scenarios. This research is crucial due to a notable gap in the existing literature; while numerous studies focus on simulated vehicle platooning systems, there is lack of research demonstrating these controllers on physical vehicle systems or robot platforms. This paper seeks to fill this gap by providing a practical demonstration of CACC systems in action, showcasing their potential for real-world application in intelligent transportation systems. The growing dependence on cars has resulted in a large number of vehicles on the road, placing a significant strain on the road infrastructure and raising the risk of accidents and traffic congestion. Research nowadays focuses on automotive system technology for providing intelligence to transportation systems in order to enhance traffic flow, road safety, and efficiency.
Stacked SVD or SVD stacked? A Random Matrix Theory perspective on data integration
Baharav, Tavor Z., Nicol, Phillip B., Irizarry, Rafael A., Ma, Rong
Modern data analysis increasingly requires identifying shared latent structure across multiple high-dimensional datasets. A commonly used model assumes that the data matrices are noisy observations of low-rank matrices with a shared singular subspace. In this case, two primary methods have emerged for estimating this shared structure, which vary in how they integrate information across datasets. The first approach, termed Stack-SVD, concatenates all the datasets, and then performs a singular value decomposition (SVD). The second approach, termed SVD-Stack, first performs an SVD separately for each dataset, then aggregates the top singular vectors across these datasets, and finally computes a consensus amongst them. While these methods are widely used, they have not been rigorously studied in the proportional asymptotic regime, which is of great practical relevance in today's world of increasing data size and dimensionality. This lack of theoretical understanding has led to uncertainty about which method to choose and limited the ability to fully exploit their potential. To address these challenges, we derive exact expressions for the asymptotic performance and phase transitions of these two methods and develop optimal weighting schemes to further improve both methods. Our analysis reveals that while neither method uniformly dominates the other in the unweighted case, optimally weighted Stack-SVD dominates optimally weighted SVD-Stack. We extend our analysis to accommodate multiple shared components, and provide practical algorithms for estimating optimal weights from data, offering theoretical guidance for method selection in practical data integration problems. Extensive numerical simulations and semi-synthetic experiments on genomic data corroborate our theoretical findings.