RAU: Reference-based Anatomical Understanding with Vision Language Models
Li, Yiwei, Liu, Yikang, Guo, Jiaqi, Zhao, Lin, Zhang, Zheyuan, Chen, Xiao, Mailhe, Boris, Mukherjee, Ankush, Chen, Terrence, Sun, Shanhui
–arXiv.org Artificial Intelligence
Anatomical understanding through deep learning is critical for automatic report generation, intra-operative navigation, and organ localization in medical imaging; however, its progress is constrained by the scarcity of expert-labeled data. A promising remedy is to leverage an annotated reference image to guide the interpretation of an unlabeled target. Although recent vision-language models (VLMs) exhibit non-trivial visual reasoning, their reference-based understanding and fine-grained localization remain limited. We introduce RAU, a framework for reference-based anatomical understanding with VLMs. We first show that a VLM learns to identify anatomical regions through relative spatial reasoning between reference and target images, trained on a moderately sized dataset. We validate this capability through visual question answering (VQA) and bounding box prediction. Next, we demonstrate that the VLM-derived spatial cues can be seamlessly integrated with the fine-grained segmentation capability of SAM2, enabling localization and pixel-level segmentation of small anatomical regions, such as vessel segments. Across two in-distribution and two out-of-distribution datasets, RAU consistently outperforms a SAM2 fine-tuning baseline using the same memory setup, yielding more accurate segmentations and more reliable localization. More importantly, its strong generalization ability makes it scalable to out-of-distribution datasets, a property crucial for medical image applications. To the best of our knowledge, RAU is the first to explore the capability of VLMs for reference-based identification, localization, and segmentation of anatomical structures in medical images. Its promising performance highlights the potential of VLM-driven approaches for anatomical understanding in automated clinical workflows.
arXiv.org Artificial Intelligence
Sep-29-2025
- Country:
- North America > United States (0.46)
- Genre:
- Research Report (1.00)
- Overview (0.93)
- Industry:
- Health & Medicine
- Therapeutic Area (1.00)
- Health Care Technology (1.00)
- Diagnostic Medicine > Imaging (1.00)
- Health & Medicine
- Technology: