Breast Cancer VLMs: Clinically Practical Vision-Language Train-Inference Models
Zheng, Shunjie-Fabian, Lee, Hyeonjun, Kooi, Thijs, Diba, Ali
–arXiv.org Artificial Intelligence
Breast cancer remains the most commonly diagnosed malignancy among women in the developed world. Early detection through mammography screening plays a pivotal role in reducing mortality rates. While computer-aided diagnosis (CAD) systems have shown promise in assisting radiologists, existing approaches face critical limitations in clinical deployment - particularly in handling the nuanced interpretation of multi-modal data and feasibility due to the requirement of prior clinical history. This study introduces a novel framework that synergistically combines visual features from 2D mammograms with structured textual descriptors derived from easily accessible clinical metadata and synthesized radiological reports through innovative to-kenization modules. Our proposed methods in this study demonstrate that strategic integration of convolutional neural networks (ConvNets) with language representations achieves superior performance to vision transformer-based models while handling high-resolution images and enabling practical deployment across diverse populations. By evaluating it on multi-national cohort screening mammograms, our multi-modal approach achieves superior performance in cancer detection and calcification identification compared to unimodal baselines, with particular improvements. The proposed method establishes a new paradigm for developing clinically viable VLM-based CAD systems that effectively leverage imaging data and contextual patient information through effective fusion mechanisms.
arXiv.org Artificial Intelligence
Oct-30-2025
- Country:
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Health & Medicine
- Diagnostic Medicine > Imaging (1.00)
- Therapeutic Area > Oncology
- Breast Cancer (0.96)
- Health & Medicine
- Technology: