AITopics

2506.02554

Country: Europe (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Ground > Road (0.86)
Automobiles & Trucks (0.85)
Information Technology > Robotics & Automation (0.71)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Data Science > Data Integration (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Tripathi, Kumud, Kumar, Chowdam Venkata, Wasnik, Pankaj

Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion

V oice Activity Detection (V AD) plays a key role in speech processing, often utilizing hand-crafted or neural features. This study examines the effectiveness of Mel-Frequency Cepstral Coefficients (MFCCs) and pre-trained model (PTM) features, including wav2vec 2.0, HuBERT, WavLM, UniSpeech, MMS, and Whisper. We propose FusionV AD, a unified framework that combines both feature types using three fusion strategies: concatenation, addition, and cross-attention (CA). Experimental results reveal that simple fusion techniques, particularly addition, outperform CA in both accuracy and efficiency. Fusion-based models consistently surpass single-feature models, highlighting the complementary nature of MFCCs and PTM features. Notably, our best-performing fusion model exceeds the state-of-the-art Pyannote across multiple datasets, achieving an absolute average improvement of 2.04%. These results confirm that simple feature fusion enhances V AD robustness while maintaining computational efficiency.

artificial intelligence, machine learning, representation, (16 more...)

2506.01365

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.73)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)

Modality Equilibrium Matters: Minor-Modality-Aware Adaptive Alternating for Cross-Modal Memory Enhancement

Shi, Xiang, Zhang, Rui, Liu, Jiawei, Liu, Yinpeng, Cheng, Qikai, Lu, Wei

Multimodal fusion is susceptible to modality imbalance, where dominant modalities overshadow weak ones, easily leading to biased learning and suboptimal fusion, especially for incomplete modality conditions. T o address this problem, we propose a Shapley-guided alternating training framework that adaptively prioritizes minor modalities to balance and thus enhance the fusion. Our method leverages Shapley V alue-based scheduling to improve the training sequence adaptively, ensuring that under-optimized modalities receive sufficient learning. Additionally, we introduce the memory module to refine and inherit modality-specific representations with a cross-modal mapping mechanism to align features at both the feature and sample levels. T o further validate the adaptability of the proposed approach, the encoder module empirically adopts both conventional and LLM-based backbones. With building up a novel mul-timodal equilibrium metric, namely, equilibrium deviation metric (EDM), we evaluate the performance in both balance and accuracy across four multimodal benchmark datasets, where our method achieves state-of-the-art (SOTA) results. Meanwhile, robustness analysis under missing modalities highlights its strong generalization capabilities. Accordingly, our findings reveal the untapped potential of alternating training, demonstrating that strategic modality priori-tization fundamentally balances and promotes multimodal learning, offering a new paradigm for optimizing multi-modal training dynamics.

artificial intelligence, machine learning, natural language, (17 more...)

2506.0003

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.46)

GenDMR: A dynamic multimodal role-swapping network for identifying risk gene phenotypes

Qin, Lina, Zhu, Cheng, Zhou, Chuqi, Huang, Yukun, Zhu, Jiayi, Liang, Ping, Wang, Jinju, Huang, Yixing, Luo, Cheng, Yao, Dezhong, Tan, Ying

Recent studies have shown that integrating multimodal data fusion techniques for imaging and genetic features is beneficial for the etiological analysis and predictive diagnosis of Alzheimer's disease (AD). However, there are several critical flaws in current deep learning methods. Firstly, there has been insufficient discussion and exploration regarding the selection and encoding of genetic information. Secondly, due to the significantly superior classification value of AD imaging features compared to genetic features, many studies in multimodal fusion emphasize the strengths of imaging features, actively mitigating the influence of weaker features, thereby diminishing the learning of the unique value of genetic features. To address this issue, this study proposes the dynamic multimodal role-swapping network (GenDMR). In GenDMR, we develop a novel approach to encode the spatial organization of single nucleotide polymorphisms (SNPs), enhancing the representation of their genomic context. Additionally, to adaptively quantify the disease risk of SNPs and brain region, we propose a multi-instance attention module to enhance model interpretability. Furthermore, we introduce a dominant modality selection module and a contrastive self-distillation module, combining them to achieve a dynamic teacher-student role exchange mechanism based on dominant and auxiliary modalities for bidirectional co-updating of different modal data. Finally, GenDMR achieves state-of-the-art performance on the ADNI public dataset and visualizes attention to different SNPs, focusing on confirming 12 potential high-risk genes related to AD, including the most classic APOE and recently highlighted significant risk genes. This demonstrates GenDMR's interpretable analytical capability in exploring AD genetic features, providing new insights and perspectives for the development of multimodal data fusion techniques.

artificial intelligence, machine learning, modality, (17 more...)

2506.01456

Country:

Europe (1.00)
Asia > China (0.29)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.94)

Gao, Yizhou, Barfoot, Tim

Tunable Virtual IMU Frame by Weighted Averaging of Multiple Non-Collocated IMUs

-- We present a new method to combine several rigidly connected but physically separated IMUs through a weighted average into a single virtual IMU (VIMU). This has the benefits of (i) reducing process noise through averaging, and (ii) allowing for tuning the location of the VIMU. The VIMU can be placed to be coincident with, for example, a camera frame or GNSS frame, thereby offering a quality-of-life improvement for users. Specifically, our VIMU removes the need to consider any lever-arm terms in the propagation model. We also present a quadratic programming method for selecting the weights to minimize the noise of the VIMU while still selecting the placement of its reference frame. We tested our method in simulation and validated it on a real dataset. The results show that our averaging technique works for IMUs with large separation and performance gain is observed in both the simulation and the real experiment compared to using only a single IMU.

artificial intelligence, imus, information fusion, (17 more...)

2506.00371

Country:

Europe (0.46)
North America > Canada > Ontario (0.28)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.94)

Lanners, Quinn, Rudin, Cynthia, Volfovsky, Alexander, Parikh, Harsh

Data Fusion for Partial Identification of Causal Effects

arXiv.org Artificial IntelligenceJun-2-2025

Data fusion techniques integrate information from heterogeneous data sources to improve learning, generalization, and decision making across data sciences. In causal inference, these methods leverage rich observational data to improve causal effect estimation, while maintaining the trustworthiness of randomized controlled trials. Existing approaches often relax the strong no unobserved confounding assumption by instead assuming exchangeability of counterfactual outcomes across data sources. However, when both assumptions simultaneously fail - a common scenario in practice - current methods cannot identify or estimate causal effects. We address this limitation by proposing a novel partial identification framework that enables researchers to answer key questions such as: Is the causal effect positive or negative? and How severe must assumption violations be to overturn this conclusion? Our approach introduces interpretable sensitivity parameters that quantify assumption violations and derives corresponding causal effect bounds. We develop doubly robust estimators for these bounds and operationalize breakdown frontier analysis to understand how causal conclusions change as assumption violations increase. We apply our framework to the Project STAR study, which investigates the effect of classroom size on students' third-grade standardized test performance. Our analysis reveals that the Project STAR results are robust to simultaneous violations of key assumptions, both on average and across various subgroups of interest. This strengthens confidence in the study's conclusions despite potential unmeasured biases in the data.

artificial intelligence, exp, information fusion, (17 more...)

2505.24296

Country: North America > United States (0.46)

Genre:

Research Report > Strength High (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.67)
Education > Assessment & Standards > Student Performance (0.48)
Education > Educational Setting > K-12 Education > Primary School (0.34)

Technology:

Information Technology > Data Science > Data Integration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)

Neural Information Processing SystemsMay-29-2025, 08:22:09 GMT

GAR: Generalized Autoregression for Multi-Fidelity Fusion Yuxin Wang

In many scientific research and engineering applications where repeated simulations of complex systems are conducted, a surrogate is commonly adopted to quickly estimate the whole system. To reduce the expensive cost of generating training examples, it has become a promising approach to combine the results of low-fidelity (fast but inaccurate) and high-fidelity (slow but accurate) simulations. Despite the fast developments of multi-fidelity fusion techniques, most existing methods require particular data structures and do not scale well to high-dimensional output. To resolve these issues, we generalize the classic autoregression (AR), which is wildly used due to its simplicity, robustness, accuracy, and tractability, and propose generalized autoregression (GAR) using tensor formulation and latent features. GAR can deal with arbitrary dimensional outputs and arbitrary multifidelity data structure to satisfy the demand of multi-fidelity fusion for complex problems; it admits a fully tractable likelihood and posterior requiring no approximate inference and scales well to high-dimensional problems.

artificial intelligence, machine learning, optimization, (18 more...)

Neural Information Processing Systems

Country:

Asia > China (0.28)
North America > United States (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.48)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceMay-29-2025

Multimodal Federated Learning: A Survey through the Lens of Different FL Paradigms

Peng, Yuanzhe, Bian, Jieming, Wang, Lei, Huang, Yin, Xu, Jie

Multimodal Federated Learning (MFL) lies at the intersection of two pivotal research areas: leveraging complementary information from multiple modalities to improve downstream inference performance and enabling distributed training to enhance efficiency and preserve privacy. Despite the growing interest in MFL, there is currently no comprehensive taxonomy that organizes MFL through the lens of different Federated Learning (FL) paradigms. This perspective is important because multimodal data introduces distinct challenges across various FL settings. These challenges, including modality heterogeneity, privacy heterogeneity, and communication inefficiency, are fundamentally different from those encountered in traditional unimodal or non-FL scenarios. In this paper, we systematically examine MFL within the context of three major FL paradigms: horizontal FL (HFL), vertical FL (VFL), and hybrid FL. For each paradigm, we present the problem formulation, review representative training algorithms, and highlight the most prominent challenge introduced by multimodal data in distributed settings. We also discuss open challenges and provide insights for future research. By establishing this taxonomy, we aim to uncover the novel challenges posed by multimodal data from the perspective of different FL paradigms and to offer a new lens through which to understand and advance the development of MFL.

artificial intelligence, information fusion, machine learning, (17 more...)

2505.21792

Country:

Europe > Switzerland (0.28)
North America > United States > Florida > Alachua County > Gainesville (0.14)

Genre:

Overview (0.67)
Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Education (0.93)
Health & Medicine > Therapeutic Area (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.87)

arXiv.org Artificial IntelligenceMay-29-2025

Learning to See More: UAS-Guided Super-Resolution of Satellite Imagery for Precision Agriculture

Masrur, Arif, Olsen, Peder A., Adler, Paul R., Jackson, Carlan, Myers, Matthew W., Sedghi, Nathan, Weil, Ray R.

Unmanned Aircraft Systems (UAS) and satellites are key data sources for precision agriculture, yet each presents trade-offs. Satellite data offer broad spatial, temporal, and spectral coverage but lack the resolution needed for many precision farming applications, while UAS provide high spatial detail but are limited by coverage and cost, especially for hyperspectral data. This study presents a novel framework that fuses satellite and UAS imagery using super-resolution methods. By integrating data across spatial, spectral, and temporal domains, we leverage the strengths of both platforms cost-effectively. We use estimation of cover crop biomass and nitrogen (N) as a case study to evaluate our approach. By spectrally extending UAS RGB data to the vegetation red edge and near-infrared regions, we generate high-resolution Sentinel-2 imagery and improve biomass and N estimation accuracy by 18% and 31%, respectively. Our results show that UAS data need only be collected from a subset of fields and time points. Farmers can then 1) enhance the spectral detail of UAS RGB imagery; 2) increase the spatial resolution by using satellite data; and 3) extend these enhancements spatially and across the growing season at the frequency of the satellite flights. Our SRCNN-based spectral extension model shows considerable promise for model transferability over other cropping systems in the Upper and Lower Chesapeake Bay regions. Additionally, it remains effective even when cloud-free satellite data are unavailable, relying solely on the UAS RGB input. The spatial extension model produces better biomass and N predictions than models built on raw UAS RGB images. Once trained with targeted UAS RGB data, the spatial extension model allows farmers to stop repeated UAS flights. While we introduce super-resolution advances, the core contribution is a lightweight and scalable system for affordable on-farm use.

artificial intelligence, machine learning, resolution, (19 more...)

2505.21746

Country: North America > United States > Maryland > Prince George's County (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Food & Agriculture > Agriculture (1.00)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.53)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
(3 more...)

arXiv.org Artificial IntelligenceMay-29-2025

Real-World Deployment of Cloud Autonomous Mobility System Using 5G Networks for Outdoor and Indoor Environments

Yang, Yufeng, Ning, Minghao, Shu, Keqi, Saleh, Aladdin, Hashemi, Ehsan, Khajepour, Amir

The growing complexity of both outdoor and indoor mobility systems demands scalable, cost-effective, and reliable perception and communication frameworks. This work presents the real-world deployment and evaluation of a Cloud Autonomous Mobility (CAM) system that leverages distributed sensor nodes connected via 5G networks, which integrates LiDAR- and camera-based perception at infrastructure units, cloud computing for global information fusion, and Ultra-Reliable Low Latency Communications (URLLC) to enable real-time situational awareness and autonomous operation. The CAM system is deployed in two distinct environments: a dense urban roundabout and a narrow indoor hospital corridor. Field experiments show improved traffic monitoring, hazard detection, and asset management capabilities. The paper also discusses practical deployment challenges and shares key insights for scaling CAM systems. The results highlight the potential of cloud-based infrastructure perception to advance both outdoor and indoor intelligent transportation systems.

artificial intelligence, cam system, cloud computing, (14 more...)

2505.21676

Country:

Asia (0.68)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Cloud Computing (1.00)
(4 more...)