Information Fusion
Merging without Forgetting: Continual Fusion of Task-Specific Models via Optimal Transport
Pan, Zecheng, Chen, Zhikang, Li, Ding, Zhang, Min, Cui, Sen, Jin, Hongshuo, Tao, Luqi, Yang, Yi, Ye, Deheng, Zhang, Yu, Zhu, Tingting, Ren, Tianling
Merging models fine-tuned for different tasks into a single unified model has become an increasingly important direction for building versatile, efficient multi-task systems. Existing approaches predominantly rely on parameter interpolation in weight space, which we show introduces significant distribution shift in the feature space and undermines task-specific knowledge. In this paper, we propose OTMF (Optimal Transport-based Masked Fusion), a novel model merging framework rooted in optimal transport theory to address the distribution shift that arises from naive parameter interpolation. Instead of directly aggregating features or weights, OTMF aligns the semantic geometry of task-specific models by discovering common masks applied to task vectors through optimal transport plans. These masks selectively extract transferable and task-agnostic components while preserving the unique structural identities of each task. To ensure scalability in real-world settings, OTMF further supports a continual fusion paradigm that incrementally integrates each new task vector without revisiting previous ones, maintaining a bounded memory footprint and enabling efficient fusion across a growing number of tasks. We conduct comprehensive experiments on multiple vision and language benchmarks, and results show that OTMF achieves state-of-the-art performance in terms of both accuracy and efficiency. These findings highlight the practical and theoretical value of our approach to model merging.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Integrating RCTs, RWD, AI/ML and Statistics: Next-Generation Evidence Synthesis
Yang, Shu, Gamalo, Margaret, Fu, Haoda
Randomized controlled trials (RCTs) have been the cornerstone of clinical evidence; however, their cost, duration, and restrictive eligibility criteria limit power and external validity. Studies using real-world data (RWD), historically considered less reliable for establishing causality, are now recognized to be important for generating real-world evidence (RWE). In parallel, artificial intelligence and machine learning (AI/ML) are being increasingly used throughout the drug development process, providing scalability and flexibility but also presenting challenges in interpretability and rigor that traditional statistics do not face. This Perspective argues that the future of evidence generation will not depend on RCTs versus RWD, or statistics versus AI/ML, but on their principled integration. To this end, a causal roadmap is needed to clarify inferential goals, make assumptions explicit, and ensure transparency about tradeoffs. We highlight key objectives of integrative evidence synthesis, including transporting RCT results to broader populations, embedding AI-assisted analyses within RCTs, designing hybrid controlled trials, and extending short-term RCTs with long-term RWD. We also outline future directions in privacy-preserving analytics, uncertainty quantification, and small-sample methods. By uniting statistical rigor with AI/ML innovation, integrative approaches can produce robust, transparent, and policy-relevant evidence, making them a key component of modern regulatory science.
- North America > United States > New York (0.04)
- North America > Greenland (0.04)
- North America > United States > North Carolina (0.04)
- (8 more...)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
- Law (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- (6 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
SWIR-LightFusion: Multi-spectral Semantic Fusion of Synthetic SWIR with Thermal IR (LWIR/MWIR) and RGB
Hussain, Muhammad Ishfaq, Van Linh, Ma, Naz, Zubia, Fatima, Unse, Ko, Yeongmin, Jeon, Moongu
Enhancing scene understanding in adverse visibility conditions remains a critical challenge for surveillance and autonomous navigation systems. Conventional imaging modalities, such as RGB and thermal infrared (MWIR / LWIR), when fused, often struggle to deliver comprehensive scene information, particularly under conditions of atmospheric interference or inadequate illumination. To address these limitations, Short-Wave Infrared (SWIR) imaging has emerged as a promising modality due to its ability to penetrate atmospheric disturbances and differentiate materials with improved clarity. However, the advancement and widespread implementation of SWIR-based systems face significant hurdles, primarily due to the scarcity of publicly accessible SWIR datasets. In response to this challenge, our research introduces an approach to synthetically generate SWIR-like structural/contrast cues (without claiming spectral reproduction) images from existing LWIR data using advanced contrast enhancement techniques. We then propose a multimodal fusion framework integrating synthetic SWIR, LWIR, and RGB modalities, employing an optimized encoder-decoder neural network architecture with modality-specific encoders and a softmax-gated fusion head. Comprehensive experiments on public RGB-LWIR benchmarks (M3FD, TNO, CAMEL, MSRS, RoadScene) and an additional private real RGB-MWIR-SWIR dataset demonstrate that our synthetic-SWIR-enhanced fusion framework improves fused-image quality (contrast, edge definition, structural fidelity) while maintaining real-time performance. We also add fair trimodal baselines (LP, LatLRR, GFF) and cascaded trimodal variants of U2Fusion/SwinFusion under a unified protocol. The outcomes highlight substantial potential for real-world applications in surveillance and autonomous systems.
- North America > United States (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)
- Asia > South Korea > Daegu > Daegu (0.04)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
KGpipe: Generation and Evaluation of Pipelines for Data Integration into Knowledge Graphs
Building high-quality knowledge graphs (KGs) from diverse sources requires combining methods for information extraction, data transformation, ontology mapping, entity matching, and data fusion. Numerous methods and tools exist for each of these tasks, but support for combining them into reproducible and effective end-to-end pipelines is still lacking. We present a new framework, KGpipe for defining and executing integration pipelines that can combine existing tools or LLM (Large Language Model) functionality. To evaluate different pipelines and the resulting KGs, we propose a benchmark to integrate heterogeneous data of different formats (RDF, JSON, text) into a seed KG. We demonstrate the flexibility of KGpipe by running and comparatively evaluating several pipelines integrating sources of the same or different formats using selected performance and quality metrics.
- Europe > Germany > Saxony > Leipzig (1.00)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- (9 more...)
- Media > Film (0.93)
- Leisure & Entertainment (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Simulating an Autonomous System in CARLA using ROS 2
Abdo, Joseph, Shibu, Aditya, Saeed, Moaiz, Aga, Abdul Maajid, Sivaprazad, Apsara, Al-Musleh, Mohamed
Abstract--Autonomous racing offers a rigorous setting to stress test perception, planning, and control under high speed and uncertainty. This paper proposes an approach to design and evaluate a software stack for an autonomous race car in CARLA: Car Learning to Act simulator, targeting competitive driving performance in the Formula Student UK Driverless (FS-AI) 2025 competition. Optimized trajectories are computed considering vehicle dynamics and simulated environmental factors such as visibility and lighting to navigate the track efficiently. The complete autonomous stack is implemented in ROS 2 and validated extensively in CARLA on a dedicated vehicle (ADS-DV) before being ported to the actual hardware, which includes the Jetson AGX Orin 64GB, ZED2i Stereo Camera, Robosense Helios 16P LiDAR, and CHCNA V Inertial Navigation System (INS). The Formula Student Driverless (FS-AI) competition has stimulated research on autonomous racing software stacks validated through both real world testing and simulation.
- Asia > Middle East > UAE > Dubai Emirate > Dubai (0.05)
- Europe > Portugal > Porto > Porto (0.04)
FusionFM: All-in-One Multi-Modal Image Fusion with Flow Matching
Zhu, Huayi, Shu, Xiu, Xiong, Youqiang, Liu, Qiao, Chen, Rui, Yuan, Di, Chang, Xiaojun, He, Zhenyu
Current multi-modal image fusion methods typically rely on task-specific models, leading to high training costs and limited scalability. While generative methods provide a unified modeling perspective, they often suffer from slow inference due to the complex sampling trajectories from noise to image. To address this, we formulate image fusion as a direct probabilistic transport from source modalities to the fused image distribution, leveraging the flow matching paradigm to improve sampling efficiency and structural consistency. To mitigate the lack of high-quality fused images for supervision, we collect fusion results from multiple state-of-the-art models as priors, and employ a task-aware selection function to select the most reliable pseudo-labels for each task. We further introduce a Fusion Refiner module that employs a divide-and-conquer strategy to systematically identify, decompose, and enhance degraded components in selected pseudo-labels. For multi-task scenarios, we integrate elastic weight consolidation and experience replay mechanisms to preserve cross-task performance and enhance continual learning ability from both parameter stability and memory retention perspectives. Our approach achieves competitive performance across diverse fusion tasks, while significantly improving sampling efficiency and maintaining a lightweight model design. The code will be available at: https://github.com/Ist-Zhy/FusionFM.
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Asia > China > Chongqing Province > Chongqing (0.04)
- Oceania > Australia (0.04)
- (3 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.89)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.46)
Conditional Information Bottleneck for Multimodal Fusion: Overcoming Shortcut Learning in Sarcasm Detection
Wang, Yihua, Jia, Qi, Xu, Cong, Chen, Feiyu, Liu, Yuhan, Zhang, Haotian, Jin, Liang, Liu, Lu, Wang, Zhichun
Multimodal sarcasm detection is a complex task that requires distinguishing subtle complementary signals across modalities while filtering out irrelevant information. Many advanced methods rely on learning shortcuts from datasets rather than extracting intended sarcasm-related features. However, our experiments show that shortcut learning impairs the model's generalization in real-world scenarios. Furthermore, we reveal the weaknesses of current modality fusion strategies for multimodal sarcasm detection through systematic experiments, highlighting the necessity of focusing on effective modality fusion for complex emotion recognition. To address these challenges, we construct MUStARD++$^{R}$ by removing shortcut signals from MUStARD++. Then, a Multimodal Conditional Information Bottleneck (MCIB) model is introduced to enable efficient multimodal fusion for sarcasm detection. Experimental results show that the MCIB achieves the best performance without relying on shortcut learning.
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- (2 more...)
MASt3R-Fusion: Integrating Feed-Forward Visual Model with IMU, GNSS for High-Functionality SLAM
Zhou, Yuxuan, Li, Xingxing, Li, Shengyu, Yan, Zhuohao, Xia, Chunxi, Feng, Shaoquan
Visual SLAM is a cornerstone technique in robotics, autonomous driving and extended reality (XR), yet classical systems often struggle with low-texture environments, scale ambiguity, and degraded performance under challenging visual conditions. Recent advancements in feed-forward neural network-based pointmap regression have demonstrated the potential to recover high-fidelity 3D scene geometry directly from images, leveraging learned spatial priors to overcome limitations of traditional multi-view geometry methods. However, the widely validated advantages of probabilistic multi-sensor information fusion are often discarded in these pipelines. In this work, we propose MASt3R-Fusion,a multi-sensor-assisted visual SLAM framework that tightly integrates feed-forward pointmap regression with complementary sensor information, including inertial measurements and GNSS data. The system introduces Sim(3)-based visualalignment constraints (in the Hessian form) into a universal metric-scale SE(3) factor graph for effective information fusion. A hierarchical factor graph design is developed, which allows both real-time sliding-window optimization and global optimization with aggressive loop closures, enabling real-time pose tracking, metric-scale structure perception and globally consistent mapping. We evaluate our approach on both public benchmarks and self-collected datasets, demonstrating substantial improvements in accuracy and robustness over existing visual-centered multi-sensor SLAM systems. The code will be released open-source to support reproducibility and further research (https://github.com/GREAT-WHU/MASt3R-Fusion).
- Transportation > Ground > Road (0.34)
- Information Technology (0.34)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.86)
Case study of a differentiable heterogeneous multiphysics solver for a nuclear fusion application
Coughlin, Jack B., Joglekar, Archis, Brodrick, Jonathan, Lavin, Alexander
This work presents a case study of a heterogeneous multiphysics solver from the nuclear fusion domain. At the macroscopic scale, an auto-differentiable ODE solver in JAX computes the evolution of the pulsed power circuit and bulk plasma parameters for a compressing Z Pinch. The ODE solver requires a closure for the impedance of the plasma load obtained via root-finding at every timestep, which we solve efficiently using gradient-based Newton iteration. However, incorporating non-differentiable production-grade plasma solvers like Gkeyll (a C/CUDA plasma simulation suite) into a gradient-based workflow is non-trivial. The ''Tesseract'' software addresses this challenge by providing a multi-physics differentiable abstraction layer made fully compatible with JAX (through the `tesseract_jax` adapter). This architecture ensures end-to-end differentiability while allowing seamless interchange between high-fidelity solvers (Gkeyll), neural surrogates, and analytical approximations for rapid, progressive prototyping.
- North America > United States > New York > Kings County > New York City (0.05)
- North America > United States > Illinois (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Bristol (0.04)