Fusion-Augmented Large Language Models: Boosting Diagnostic Trustworthiness via Model Consensus

Siam, Md Kamrul, Faruk, Md Jobair Hossain, Cheng, Jerry Q., Gu, Huanying

Oct-21-2025–arXiv.org Artificial Intelligence

Abstract--This study presents a novel multi-model fusion framework leveraging two state-of-the-art large language models (LLMs), ChatGPT and Claude, to enhance the reliability of chest X-ray interpretation on the CheXpert dataset. From the full CheXpert corpus of 224,316 chest radiographs, we randomly selected 234 radiologist-annotated studies to evaluate unimodal performance using image-only prompts. In this setting, ChatGPT and Claude achieved diagnostic accuracies of 62.8% and 76.9%, respectively. A similarity-based consensus approach, using a 95% output similarity threshold, improved accuracy to 77.6%. T o assess the impact of multimodal inputs, we then generated synthetic clinical notes following the MIMIC-CXR template and evaluated a separate subset of 50 randomly selected cases paired with both images and synthetic text. On this multimodal cohort, performance improved to 84% for ChatGPT and 76% for Claude, while consensus accuracy reached 91.3%. Across both experimental conditions, agreement-based fusion consistently outperformed individual models. These findings highlight the utility of integrating complementary modalities and using output-level consensus to improve the trustworthiness and clinical utility of AI-assisted radiological diagnosis, offering a practical path to reduce diagnostic errors with minimal computational overhead.

accuracy, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

Oct-21-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (0.93)

Industry:
- Health & Medicine
  - Nuclear Medicine (1.00)
  - Diagnostic Medicine > Imaging (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found