AITopics | fusion model

Collaborating Authors

fusion model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On Single Source Robustness in Deep Fusion Models

Taewan Kim, Joydeep Ghosh

Neural Information Processing SystemsFeb-12-2026, 19:16:45 GMT

One natural way of improving a model's performance is to make use of multiple input sources relevant to a given task so that enough information can be extracted to build strong features.

artificial intelligence, machine learning, robustness, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.69)

Add feedback

Exploring Fusion Strategies for Multimodal Vision-Language Systems

Willis, Regan, Bakos, Jason

arXiv.org Artificial IntelligenceDec-1-2025

Modern machine learning models often combine multiple input streams of data to more accurately capture the information that informs their decisions. In multimodal machine learning, choosing the strategy for fusing data together requires careful consideration of the application's accuracy and latency requirements, as fusing the data at earlier or later stages in the model architecture can lead to performance changes in accuracy and latency. T o demonstrate this trade-off, we investigate different fusion strategies using a hybrid BERT and vision network framework that integrates image and text data. W e explore two different vision networks: MobileNetV2 and ViT. W e propose three models for each vision network, which fuse data at late, intermediate, and early stages in the architecture. W e evaluate the proposed models on the CMU-MOSI dataset and benchmark their latency on an NVIDIA Jetson Orin AGX. Our experimental results demonstrate that while late fusion yields the highest accuracy, early fusion offers the lowest inference latency. W e describe the three proposed model architectures and discuss the accuracy and latency trade-offs, concluding that data fusion earlier in the model architecture results in faster inference times at the cost of accuracy.

artificial intelligence, information fusion, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.21889

Country: North America > United States > South Carolina > Richland County > Columbia (0.14)

Genre:

Research Report (1.00)
Overview (0.93)

Industry: Information Technology (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.91)

Add feedback

LLMCARE: early detection of cognitive impairment via transformer models enhanced by LLM-generated synthetic data

Zolnour, Ali, Azadmaleki, Hossein, Haghbin, Yasaman, Taherinezhad, Fatemeh, Nezhad, Mohamad Javad Momeni, Rashidi, Sina, Khani, Masoud, Taleban, AmirSajjad, Sani, Samin Mahdizadeh, Dadkhah, Maryam, Noble, James M., Bakken, Suzanne, Yaghoobzadeh, Yadollah, Vahabie, Abdol-Hossein, Rouhizadeh, Masoud, Zolnoori, Maryam

arXiv.org Artificial IntelligenceNov-14-2025

Alzheimer's disease and related dementias(ADRD) affect nearly five million older adults in the United States, yet more than half remain undiagnosed. Speech-based natural language processing(NLP) offers a scalable approach for detecting early cognitive decline through subtle linguistic markers that may precede clinical diagnosis. This study develops and evaluates a speech-based screening pipeline integrating transformer embeddings with handcrafted linguistic features, synthetic augmentation using large language models(LLMs), and benchmarking of unimodal and multimodal classifiers. External validation assessed generalizability to a MCI-only cohort. Transcripts were drawn from the ADReSSo 2021 benchmark dataset(n=237, Pitt Corpus) and the DementiaBank Delaware corpus(n=205, MCI vs. controls). Ten transformer models were tested under three fine-tuning strategies. A late-fusion model combined embeddings from the top transformer with 110 linguistic features. Five LLMs(LLaMA8B/70B, MedAlpaca7B, Ministral8B,GPT-4o) generated label-conditioned synthetic speech for augmentation, and three multimodal LLMs(GPT-4o,Qwen-Omni,Phi-4) were evaluated in zero-shot and fine-tuned modes. On ADReSSo, the fusion model achieved F1=83.3(AUC=89.5), outperforming transformer-only and linguistic baselines. MedAlpaca7B augmentation(2x) improved F1=85.7, though larger scales reduced gains. Fine-tuning boosted unimodal LLMs(MedAlpaca7B F1=47.7=>78.7), while multimodal models performed lower (Phi-4=71.6;GPT-4o=67.6). On Delaware, the fusion plus 1x MedAlpaca7B model achieved F1=72.8(AUC=69.6). Integrating transformer and linguistic features enhances ADRD detection. LLM-based augmentation improves data efficiency but yields diminishing returns, while current multimodal models remain limited. Validation on an independent MCI cohort supports the pipeline's potential for scalable, clinically relevant early screening.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.3389/frai.2025.1669896

2508.10027

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mixed Signals: Understanding Model Disagreement in Multimodal Empathy Detection

Srikanth, Maya, Chen, Run, Hirschberg, Julia

arXiv.org Artificial IntelligenceNov-12-2025

Multimodal models play a key role in empathy detection, but their performance can suffer when modalities provide conflicting cues. To understand these failures, we examine cases where unimodal and multimodal predictions diverge. Using fine-tuned models for text, audio, and video, along with a gated fusion model, we find that such disagreements often reflect underlying ambiguity, as evidenced by annotator uncertainty. Our analysis shows that dominant signals in one modality can mislead fusion when unsupported by others. We also observe that humans, like models, do not consistently benefit from multimodal input. These insights position disagreement as a useful diagnostic signal for identifying challenging examples and improving empathy system robustness.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.13979

Country:

Europe (0.68)
Asia (0.46)
North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Explainable Deep Neural Network for Multimodal ECG Signals: Intermediate vs Late Fusion

Oladunni, Timothy, Aneni, Ehimen

arXiv.org Artificial IntelligenceOct-14-2025

The limitations of unimodal deep learning models, particularly their tendency to overfit and limited generalizability, have renewed interest in multimodal fusion strategies. Multimodal deep neural networks (MDNN) have the capability of integrating diverse data domains and offer a promising solution for robust and accurate predictions. However, the optimal fusion strategy, intermediate fusion (feature-level) versus late fusion (decision-level) remains insufficiently examined, especially in high-stakes clinical contexts such as ECG-based cardiovascular disease (CVD) classification. This study investigates the comparative effectiveness of intermediate and late fusion strategies using ECG signals across three domains: time, frequency, and time-frequency. A series of experiments were conducted to identify the highest-performing fusion architecture. Results demonstrate that intermediate fusion consistently outperformed late fusion, achieving a peak accuracy of 97 percent, with Cohen's d > 0.8 relative to standalone models and d = 0.40 compared to late fusion. Interpretability analyses using saliency maps reveal that both models align with the discretized ECG signals. Statistical dependency between the discretized ECG signals and corresponding saliency maps for each class was confirmed using Mutual Information (MI). The proposed ECG domain-based multimodal model offers superior predictive capability and enhanced explainability, crucial attributes in medical AI applications, surpassing state-of-the-art models.

artificial intelligence, machine learning, transformer, (18 more...)

arXiv.org Artificial Intelligence

2508.11666

Country:

North America > United States (0.46)
Europe (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On Single Source Robustness in Deep Fusion Models

Taewan Kim, Joydeep Ghosh

Neural Information Processing SystemsOct-3-2025, 03:11:45 GMT

One natural way of improving a model's performance is to make use of multiple input sources relevant to a given task so that enough information can be extracted to build strong features.

artificial intelligence, information fusion, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.46)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(2 more...)

Add feedback

Exploring Urban Factors with Autoencoders: Relationship Between Static and Dynamic Features

Pocco, Ximena, Hassan, Waqar, Salinas, Karelia, Molchanov, Vladimir, Nonato, Luis G.

arXiv.org Artificial IntelligenceSep-9-2025

Urban analytics utilizes extensive datasets with diverse urban information to simulate, predict trends, and uncover complex patterns within cities. While these data enables advanced analysis, it also presents challenges due to its granularity, heterogeneity, and multimodality. To address these challenges, visual analytics tools have been developed to support the exploration of latent representations of fused heterogeneous and multimodal data, discretized at a street-level of detail. However, visualization-assisted tools seldom explore the extent to which fused data can offer deeper insights than examining each data source independently within an integrated visualization framework. In this work, we developed a visualization-assisted framework to analyze whether fused latent data representations are more effective than separate representations in uncovering patterns from dynamic and static urban data. The analysis reveals that combined latent representations produce more structured patterns, while separate ones are useful in particular cases.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.06167

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.93)

Add feedback

Opportunistic Screening for Pancreatic Cancer using Computed Tomography Imaging and Radiology Reports

Le, David, Correa-Medero, Ramon, Tariq, Amara, Patel, Bhavik, Yano, Motoyo, Banerjee, Imon

arXiv.org Artificial IntelligenceMar-31-2025

Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive cancer, with most cases diagnosed at stage IV and a five-year overall survival rate below 5%. Early detection and prognosis modeling are crucial for improving patient outcomes and guiding early intervention strategies. In this study, we developed and evaluated a deep learning fusion model that integrates radiology reports and CT imaging to predict PDAC risk. The model achieved a concordance index (C-index) of 0.6750 (95% CI: 0.6429, 0.7121) and 0.6435 (95% CI: 0.6055, 0.6789) on the internal and external dataset, respectively, for 5-year survival risk estimation. Kaplan-Meier analysis demonstrated significant separation (p<0.0001) between the low and high risk groups predicted by the fusion model. These findings highlight the potential of deep learning-based survival models in leveraging clinical and imaging data for pancreatic cancer.

artificial intelligence, diagnosis, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2504.00232

Country:

North America > United States > Arizona > Maricopa County > Phoenix (0.14)
North America > United States > Florida > Duval County > Jacksonville (0.05)
North America > United States > Minnesota > Olmsted County > Rochester (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Pancreatic Cancer (1.00)
Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Multimodal Fusion Model Leveraging MLP Mixer and Handcrafted Features-based Deep Learning Networks for Facial Palsy Detection

Oo, Heng Yim Nicole, Lee, Min Hun, Lim, Jeong Hoon

arXiv.org Artificial IntelligenceMar-13-2025

Algorithmic detection of facial palsy offers the potential to improve current practices, which usually involve labor-intensive and subjective assessments by clinicians. In this paper, we present a multimodal fusion-based deep learning model that utilizes an MLP mixer-based model to process unstructured data (i.e. RGB images or images with facial line segments) and a feed-forward neural network to process structured data (i.e. facial landmark coordinates, features of facial expressions, or handcrafted features) for detecting facial palsy. We then contribute to a study to analyze the effect of different data modalities and the benefits of a multimodal fusion-based approach using videos of 20 facial palsy patients and 20 healthy subjects. Our multimodal fusion model achieved 96.00 F1, which is significantly higher than the feed-forward neural network trained on handcrafted features alone (82.80 F1) and an MLP mixer-based model trained on raw RGB images (89.00 F1).

data modality, facial palsy, rgb image, (13 more...)

arXiv.org Artificial Intelligence

2503.10371

Country:

Asia > Singapore > Central Region > Singapore (0.04)
North America > United States (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

fusion model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

On Single Source Robustness in Deep Fusion Models

Exploring Fusion Strategies for Multimodal Vision-Language Systems

LLMCARE: early detection of cognitive impairment via transformer models enhanced by LLM-generated synthetic data

Mixed Signals: Understanding Model Disagreement in Multimodal Empathy Detection

Explainable Deep Neural Network for Multimodal ECG Signals: Intermediate vs Late Fusion

On Single Source Robustness in Deep Fusion Models

8420d359404024567b5aefda1231af24-AuthorFeedback.pdf

Exploring Urban Factors with Autoencoders: Relationship Between Static and Dynamic Features

Opportunistic Screening for Pancreatic Cancer using Computed Tomography Imaging and Radiology Reports

A Multimodal Fusion Model Leveraging MLP Mixer and Handcrafted Features-based Deep Learning Networks for Facial Palsy Detection