AITopics | Information Fusion

Collaborating Authors

Information Fusion

News Overviews Instructional Materials AI-Alerts Classics

Variational Task Vector Composition

Zhang, Boyuan, Du, Yingjun, Zhen, Xiantong, Shao, Ling

arXiv.org Artificial IntelligenceSep-24-2025

Task vectors capture how a model changes during fine-tuning by recording the difference between pre-trained and task-specific weights. The composition of task vectors, a key operator in task arithmetic, enables models to integrate knowledge from multiple tasks without incurring additional inference costs. In this paper, we propose variational task vector composition, where composition coefficients are taken as latent variables and estimated in a Bayesian inference framework. Unlike previous methods that operate at the task level, our framework focuses on sample-specific composition. Motivated by the observation of structural redundancy in task vectors, we introduce a Spike-and-Slab prior that promotes sparsity and preserves only the most informative components. To further address the high variance and sampling inefficiency in sparse, high-dimensional spaces, we develop a gated sampling mechanism that constructs a controllable posterior by filtering the composition coefficients based on both uncertainty and importance. This yields a more stable and interpretable variational framework by deterministically selecting reliable task components, reducing sampling variance while improving transparency and generalization. Experimental results demonstrate that our method consistently outperforms existing approaches across all datasets by selectively leveraging the most reliable and informative components in task vectors. These findings highlight the practical value of our approach, establishing a new standard for efficient and effective task vector composition.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.18208

Country:

North America > United States (0.46)
Europe (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
(4 more...)

Add feedback

Can multimodal representation learning by alignment preserve modality-specific information?

Thoreau, Romain, Levillain, Jessie, Derksen, Dawa

arXiv.org Artificial IntelligenceSep-23-2025

Combining multimodal data is a key issue in a wide range of machine learning tasks, including many remote sensing problems. In Earth observation, early multimodal data fusion methods were based on specific neural network architectures and supervised learning. Ever since, the scarcity of labeled data has motivated self-supervised learning techniques. State-of-the-art multimodal representation learning techniques leverage the spatial alignment between satellite data from different modalities acquired over the same geographic area in order to foster a semantic alignment in the latent space. In this paper, we investigate how this methods can preserve task-relevant information that is not shared across modalities. First, we show, under simplifying assumptions, when alignment strategies fundamentally lead to an information loss. Then, we support our theoretical insight through numerical experiments in more realistic settings. With those theoretical and empirical evidences, we hope to support new developments in contrastive learning for the combination of multimodal satellite data. Our code and data is publicly available at https://github.com/Romain3Ch216/alg_maclean_25.

artificial intelligence, information, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2509.17943

Genre: Research Report (0.83)

Industry: Energy (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.54)

Add feedback

From Mimicry to True Intelligence (TI) -- A New Paradigm for Artificial General Intelligence

Subasioglu, Meltem, Subasioglu, Nevzat

arXiv.org Artificial IntelligenceSep-23-2025

The debate around Artificial General Intelligence (AGI) remains open due to two fundamentally different goals: replicating human-like performance versus replicating human-like cognitive processes. We argue that current performance-based definitions are inadequate because they provide no clear, mechanism-focused roadmap for research, and they fail to properly define the qualitative nature of genuine intelligence. Drawing inspiration from the human brain, we propose a new paradigm that shifts the focus from external mimicry to the development of foundational cognitive architectures. We define True Intelligence (TI) as a system characterized by six core components: embodied sensory fusion, core directives, dynamic schemata creation, a highly-interconnected multi-expert architecture, an orchestration layer, and lastly, the unmeasurable quality of Interconnectedness, which we hypothesize results in consciousness and a subjective experience. We propose a practical, five-level taxonomy of AGI based on the number of the first five measurable components a system exhibits. This framework provides a clear path forward with developmental milestones that directly address the challenge of building genuinely intelligent systems. We contend that once a system achieves Level-5 AGI by implementing all five measurable components, the difference between it and TI remains as a purely philosophical debate. For practical purposes - and given theories indicate consciousness is an emergent byproduct of integrated, higher-order cognition - we conclude that a fifth-level AGI is functionally and practically equivalent to TI. This work synthesizes diverse insights from analytical psychology, schema theory, metacognition, modern brain architectures and latest works in AI to provide the first holistic, mechanism-based definition of AGI that offers a clear and actionable path for the research community.

evolutionary algorithm, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2509.14474

Country:

North America > United States (1.00)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.93)
(5 more...)

Add feedback

Semantic-LiDAR-Inertial-Wheel Odometry Fusion for Robust Localization in Large-Scale Dynamic Environments

Jiang, Haoxuan, Qian, Peicong, Xie, Yusen, Zheng, Linwei, Li, Xiaocong, Liu, Ming, Ma, Jun

arXiv.org Artificial IntelligenceSep-19-2025

Reliable, drift-free global localization presents significant challenges yet remains crucial for autonomous navigation in large-scale dynamic environments. In this paper, we introduce a tightly-coupled Semantic-LiDAR-Inertial-Wheel Odometry fusion framework, which is specifically designed to provide high-precision state estimation and robust localization in large-scale dynamic environments. Our framework leverages an efficient semantic-voxel map representation and employs an improved scan matching algorithm, which utilizes global semantic information to significantly reduce long-term trajectory drift. Furthermore, it seamlessly fuses data from LiDAR, IMU, and wheel odometry using a tightly-coupled multi-sensor fusion Iterative Error-State Kalman Filter (iESKF). This ensures reliable localization without experiencing abnormal drift. Moreover, to tackle the challenges posed by terrain variations and dynamic movements, we introduce a 3D adaptive scaling strategy that allows for flexible adjustments to wheel odometry measurement weights, thereby enhancing localization precision. This study presents extensive real-world experiments conducted in a one-million-square-meter automated port, encompassing 3,575 hours of operational data from 35 Intelligent Guided Vehicles (IGVs). The results consistently demonstrate that our system outperforms state-of-the-art LiDAR-based localization methods in large-scale dynamic environments, highlighting the framework's reliability and practical value.

artificial intelligence, localization, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.14999

Country: Asia > China > Guangdong Province (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.34)

Add feedback

The Mean of Multi-Object Trajectories

Nguyen, Tran Thien Dat, Vo, Ba Tuong, Vo, Ba-Ngu, Van Nguyen, Hoa, Shim, Changbeom

arXiv.org Artificial IntelligenceSep-19-2025

This paper introduces the concept of a mean for trajectories and multi-object trajectories (defined as sets or multi-sets of trajectories) along with algorithms for computing them. Specifically, we use the Fréchet mean, and metrics based on the optimal sub-pattern assignment (OSPA) construct, to extend the notion of average from vectors to trajectories and multi-object trajectories. Further, we develop efficient algorithms to compute these means using greedy search and Gibbs sampling. Using distributed multi-object tracking as an application, we demonstrate that the Fréchet mean approach to multi-object trajectory consensus significantly outperforms state-of-the-art distributed multi-object tracking methods.

artificial intelligence, machine learning, trajectory, (18 more...)

arXiv.org Artificial Intelligence

2504.20391

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
(3 more...)

Add feedback

A Multimodal Foundation Model to Enhance Generalizability and Data Efficiency for Pan-cancer Prognosis Prediction

Zhou, Huajun, Zhou, Fengtao, Ma, Jiabo, Xu, Yingxue, Wang, Xi, Zhang, Xiuming, Liang, Li, Li, Zhenhui, Chen, Hao

arXiv.org Artificial IntelligenceSep-17-2025

Multimodal data provides heterogeneous information for a holistic understanding of the tumor microenvironment. However, existing AI models often struggle to harness the rich information within multimodal data and extract poorly generalizable representations. Here we present MICE (Multimodal data Integration via Collaborative Experts), a multimodal foundation model that effectively integrates pathology images, clinical reports, and genomics data for precise pan-cancer prognosis prediction. Instead of conventional multi-expert modules, MICE employs multiple functionally diverse experts to comprehensively capture both cross-cancer and cancer-specific insights. Leveraging data from 11,799 patients across 30 cancer types, we enhanced MICE's generalizability by coupling contrastive and supervised learning. MICE outperformed both unimodal and state-of-the-art multi-expert-based multimodal models, demonstrating substantial improvements in C-index ranging from 3.8% to 11.2% on internal cohorts and 5.8% to 8.8% on independent cohorts, respectively. Moreover, it exhibited remarkable data efficiency across diverse clinical scenarios. With its enhanced generalizability and data efficiency, MICE establishes an effective and scalable foundation for pan-cancer prognosis prediction, holding strong potential to personalize tailored therapies and improve treatment outcomes.

artificial intelligence, information fusion, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.126

Country: Asia > China (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.93)
Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.68)

Add feedback

DyKen-Hyena: Dynamic Kernel Generation via Cross-Modal Attention for Multimodal Intent Recognition

Wang, Yifei, Wang, Wenbin, Luo, Yong

arXiv.org Artificial IntelligenceSep-15-2025

Though Multimodal Intent Recognition (MIR) proves effective by utilizing rich information from multiple sources (e.g., language, video, and audio), the potential for intent-irrelevant and conflicting information across modalities may hinder performance from being further improved. Most current models attempt to fuse modalities by applying mechanisms like multi-head attention to unimodal feature sequences and then adding the result back to the original representation. This process risks corrupting the primary linguistic features with noisy or irrelevant non-verbal signals, as it often fails to capture the fine-grained, token-level influence where non-verbal cues should modulate, not just augment, textual meaning. To address this, we introduce DyKen-Hyena, which reframes the problem from feature fusion to processing modulation. Our model translates audio-visual cues into dynamic, per-token convolutional kernels that directly modulate textual feature extraction. This fine-grained approach achieves state-of-the-art results on the MIntRec and MIntRec2.0 benchmarks. Notably, it yields a +10.46% F1-score improvement in out-of-scope detection, validating that our method creates a fundamentally more robust intent representation.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.0994

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.48)

Add feedback

Personalized and Demand-Based Education Concept: Practical Tools for Control Engineers

Varga, Balint, Fischer, Lars, Kovacs, Levente

arXiv.org Artificial IntelligenceSep-11-2025

This paper presents a personalized lecture concept using educational blocks and its demonstrative application in a new university lecture. Higher education faces daily challenges: deep and specialized knowledge is available from everywhere and accessible to almost everyone. University lecturers of specialized master courses confront the problem that their lectures are either too boring or too complex for the attending students. Additionally, curricula are changing more rapidly than they have in the past 10-30 years. The German education system comprises different educational forms, with universities providing less practical content. Consequently, many university students do not obtain the practical skills they should ideally gain through university lectures. Therefore, in this work, a new lecture concept is proposed based on the extension of the just-in-time teaching paradigm: Personalized and Demand-Based Education. This concept includes: 1) an initial assessment of students' backgrounds, 2) selecting the appropriate educational blocks, and 3) collecting ongoing feedback during the semester. The feedback was gathered via Pingo, ensuring anonymity for the students. Our concept was exemplarily tested in the new lecture "Practical Tools for Control Engineers" at the Karlsruhe Institute of Technology. The initial results indicate that our proposed concept could be beneficial in addressing the current challenges in higher education.

artificial intelligence, machine learning, student, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.ifacol.2025.08.015

2504.07466

Country:

North America > United States (0.68)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.26)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education > Educational Setting > Higher Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.68)

Add feedback

FusWay: Multimodal hybrid fusion approach. Application to Railway Defect Detection

Zhukov, Alexey, Benois-Pineau, Jenny, Youssef, Amira, Zemmari, Akka, Mosbah, Mohamed, Taillandier, Virginie

arXiv.org Artificial IntelligenceSep-10-2025

Multimodal fusion is a multimedia technique that has become popular in the wide range of tasks where image information is accompanied by a signal/audio. The latter may not convey highly semantic information, such as speech or music, but some measures such as audio signal recorded by mics in the goal to detect rail structure elements or defects. While classical detection approaches such as You Only Look Once (YOLO) family detectors can be efficiently deployed for defect detection on the image modality, the single modality approaches remain limited. They yield an overdetection in case of the appearance similar to normal structural elements. The paper proposes a new multimodal fusion architecture built on the basis of domain rules with YOLO and Vision transformer backbones. It integrates YOLOv8n for rapid object detection with a Vision Transformer (ViT) to combine feature maps extracted from multiple layers (7, 16, and 19) and synthesised audio representations for two defect classes: rail Rupture and Surface defect. Fusion is performed between audio and image. Experimental evaluation on a real-world railway dataset demonstrates that our multimodal fusion improves precision and overall accuracy by 0.2 points compared to the vision-only approach. Student's unpaired t-test also confirms statistical significance of differences in the mean accuracy.

artificial intelligence, fuzzy logic, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.06987

Country: Europe > France (0.29)

Genre: Research Report > Experimental Study (0.86)

Industry: Transportation > Ground > Rail (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.68)

Add feedback

A Knowledge-Guided Cross-Modal Feature Fusion Model for Local Traffic Demand Prediction

Zhang, Lingyu, Xu, Pengfei, Wu, Guobin, Liang, Jian, Dong, Ruiyang, Wang, Yunhai, Song, Xuan

arXiv.org Artificial IntelligenceSep-10-2025

Traffic demand prediction plays a critical role in intelligent transportation systems. Existing traffic prediction models primarily rely on temporal traffic data, with limited efforts incorporating human knowledge and experience for urban traffic demand forecasting. However, in real-world scenarios, traffic knowledge and experience derived from human daily life significantly influence precise traffic prediction. Such knowledge and experiences can guide the model in uncovering latent patterns within traffic data, thereby enhancing the accuracy and robustness of predictions. To this end, this paper proposes integrating structured temporal traffic data with textual data representing human knowledge and experience, resulting in a novel knowledge-guided cross-modal feature representation learning (KGCM) model for traffic demand prediction. Based on regional transportation characteristics, we construct a prior knowledge dataset using a large language model combined with manual authoring and revision, covering both regional and global knowledge and experiences. The KGCM model then learns multimodal data features through designed local and global adaptive graph networks, as well as a cross-modal feature fusion mechanism. A proposed reasoning-based dynamic update strategy enables dynamic optimization of the graph model's parameters, achieving optimal performance. Experiments on multiple traffic datasets demonstrate that our model accurately predicts future traffic demand and outperforms existing state-of-the-art (SOTA) models.

artificial intelligence, knowledge-guided cross-modal feature fusion model, natural language, (2 more...)

arXiv.org Artificial Intelligence

2509.06976

Genre: Research Report (0.40)

Industry: Telecommunications > Networks (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.40)

Add feedback