MedVQA-TREE: A Multimodal Reasoning and Retrieval Framework for Sarcopenia Prediction
Moradbeiki, Pardis, Ghadiri, Nasser, Zahabi, Sayed Jalal, Wiil, Uffe Kock, Brockhattingen, Kristoffer Kittelmann, Ebrahimi, Ali
–arXiv.org Artificial Intelligence
Accurate sarcopenia diagnosis via ultrasound remains challenging due to subtle imaging cues, limited labeled data, and the absence of clinical context in most models. We propose MedVQA-TREE, a multimodal framework that integrates a hierarchical image interpretation module, a gated feature-level fusion mechanism, and a novel multi-hop, multi-query retrieval strategy. The vision module includes anatomical classification, region segmentation, and graph-based spatial reasoning to capture coarse, mid-level, and fine-grained structures. A gated fusion mechanism selectively integrates visual features with textual queries, while clinical knowledge is retrieved through a UMLS-guided pipeline accessing PubMed and a sarcopenia-specific external knowledge base. MedVQA-TREE was trained and evaluated on two public MedVQA datasets (VQA-RAD and PathVQA) and a custom sarcopenia ultrasound dataset. The model achieved up to 99% diagnostic accuracy and outperformed previous state-of-the-art methods by over 10%. These results underscore the benefit of combining structured visual understanding with guided knowledge retrieval for effective AI-assisted diagnosis in sarcopenia.
arXiv.org Artificial Intelligence
Aug-28-2025
- Country:
- North America > United States
- Maryland > Montgomery County > Bethesda (0.04)
- Europe
- Denmark > Southern Denmark (0.04)
- Finland > Uusimaa
- Helsinki (0.04)
- Asia > Middle East
- Iran > Isfahan Province > Isfahan (0.04)
- North America > United States
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Information Technology (0.93)
- Health & Medicine
- Diagnostic Medicine > Imaging (1.00)
- Therapeutic Area > Oncology (0.92)
- Technology:
- Information Technology
- Sensing and Signal Processing > Image Processing (1.00)
- Artificial Intelligence
- Vision (1.00)
- Representation & Reasoning
- Information Fusion (1.00)
- Diagnosis (0.88)
- Spatial Reasoning (0.66)
- Natural Language
- Large Language Model (1.00)
- Text Processing (0.93)
- Machine Learning
- Neural Networks > Deep Learning (1.00)
- Performance Analysis > Accuracy (0.68)
- Information Technology