Shen, Quanli
Anatomical Structure-Guided Medical Vision-Language Pre-training
Li, Qingqiu, Yan, Xiaohan, Xu, Jilan, Yuan, Runtian, Zhang, Yuejie, Feng, Rui, Shen, Quanli, Zhang, Xiaobo, Wang, Shujun
Learning medical visual representations through vision-lang uage pre-training has reached remarkable progress. Despite the promising performance, it still faces challenges, i.e., local alignment lacks interpretability and clinical relevance, and the insufficient internal and external representation learning of image-report pairs. To address these issues, we propose an Anatomical Structure-Guided (ASG) framework. Specifically, we parse raw reports into triplets