Architecting Clinical Collaboration: Multi-Agent Reasoning Systems for Multimodal Medical VQA

Thakrar, Karishma, Basavatia, Shreyas, Daftardar, Akshay

Aug-27-2025–arXiv.org Artificial Intelligence

--Dermatological care via telemedicine often lacks the rich context of in-person visits. Clinicians must make diagnoses based on a handful of images and brief descriptions, without the benefit of physical exams, second opinions, or reference materials. While many medical AI systems attempt to bridge these gaps with domain-specific fine-tuning, this work hypothesized that mimicking clinical reasoning processes could offer a more effective path forward. This study tested seven vision-language models on medical visual question answering across six configurations: baseline models, fine-tuned variants, and both augmented with either reasoning layers that combine multiple model perspectives, analogous to peer consultation, or retrieval-augmented generation that incorporates medical literature at inference time, serving a role similar to reference-checking. While fine-tuning degraded performance in four of seven models with an average 30% decrease, baseline models collapsed on test data. Clinical-inspired architectures, meanwhile, achieved up to 70% accuracy, maintaining performance on unseen data while generating explainable, literature-grounded outputs critical for clinical adoption. These findings demonstrate that medical AI succeeds by reconstructing the collaborative and evidence-based practices fundamental to clinical diagnosis. Fine-tuning large models on medical data, the standard approach to medical AI, assumes domain exposure produces clinical competence [1]. Y et dermatology models show 15% performance drops in real-world settings [2], and catastrophic forgetting causes models to generate outputs exclusively from their training data [3]. This brittleness suggests a fundamental mismatch between current approaches and clinical reasoning. Additionally, physician groups achieve 85.6% diagnostic accuracy versus 62.5% for individuals [4], as collaboration reduces cognitive load and bias [5]. However, logistical constraints force physicians to work alone, a problem telemedicine intensifies by eliminating physical exams, peer consultation, and immediate reference access [6].

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Aug-27-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Georgia > Fulton County > Atlanta (0.40)

Genre:
- Research Report > New Finding (0.48)

Industry:
- Health & Medicine
  - Diagnostic Medicine (1.00)
  - Health Care Technology > Telehealth (1.00)
  - Therapeutic Area
    - Dermatology (0.89)
    - Oncology (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science > Problem Solving (0.85)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language
    - Chatbot (0.89)
    - Large Language Model (1.00)
  - Representation & Reasoning > Diagnosis (0.88)