MedVQA-TREE: A Multimodal Reasoning and Retrieval Framework for Sarcopenia Prediction

Moradbeiki, Pardis, Ghadiri, Nasser, Zahabi, Sayed Jalal, Wiil, Uffe Kock, Brockhattingen, Kristoffer Kittelmann, Ebrahimi, Ali

Aug-28-2025–arXiv.org Artificial Intelligence

Accurate sarcopenia diagnosis via ultrasound remains challenging due to subtle imaging cues, limited labeled data, and the absence of clinical context in most models. We propose MedVQA-TREE, a multimodal framework that integrates a hierarchical image interpretation module, a gated feature-level fusion mechanism, and a novel multi-hop, multi-query retrieval strategy. The vision module includes anatomical classification, region segmentation, and graph-based spatial reasoning to capture coarse, mid-level, and fine-grained structures. A gated fusion mechanism selectively integrates visual features with textual queries, while clinical knowledge is retrieved through a UMLS-guided pipeline accessing PubMed and a sarcopenia-specific external knowledge base. MedVQA-TREE was trained and evaluated on two public MedVQA datasets (VQA-RAD and PathVQA) and a custom sarcopenia ultrasound dataset. The model achieved up to 99% diagnostic accuracy and outperformed previous state-of-the-art methods by over 10%. These results underscore the benefit of combining structured visual understanding with guided knowledge retrieval for effective AI-assisted diagnosis in sarcopenia.

large language model, machine learning, natural language, (24 more...)

arXiv.org Artificial Intelligence

Aug-28-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Maryland > Montgomery County > Bethesda (0.04)
- Europe
  - Denmark > Southern Denmark (0.04)
  - Finland > Uusimaa
    - Helsinki (0.04)
- Asia > Middle East
  - Iran > Isfahan Province > Isfahan (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Information Technology (0.93)
- Health & Medicine
  - Diagnostic Medicine > Imaging (1.00)
  - Therapeutic Area > Oncology (0.92)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning
      - Information Fusion (1.00)
      - Diagnosis (0.88)
      - Spatial Reasoning (0.66)
    - Natural Language
      - Large Language Model (1.00)
      - Text Processing (0.93)
    - Machine Learning
      - Neural Networks > Deep Learning (1.00)
      - Performance Analysis > Accuracy (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found