RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training

Yuan, Zheng, Jin, Qiao, Tan, Chuanqi, Zhao, Zhengyun, Yuan, Hongyi, Huang, Fei, Huang, Songfang

Mar-1-2023–arXiv.org Artificial Intelligence

Vision-and-language multi-modal pretraining and fine-tuning have shown great success in visual question answering (VQA). Compared to general domain VQA, the performance of biomedical VQA suffers from limited data. In this paper, we propose a retrieval-augmented pretrain-and-finetune paradigm named RAMM for biomedical VQA to overcome the data limitation issue. Specifically, we collect a new biomedical dataset named PMCPM which offers patient-based image-text pairs containing diverse patient situations from PubMed. Then, we pretrain the biomedical multi-modal model to learn visual and textual representation for image-text pairs and align these representations with image-text contrastive objective (ITC). Finally, we propose a retrieval-augmented method to better use the limited data. We propose to retrieve similar image-text pairs based on ITC from pretraining datasets and introduce a novel retrieval-attention module to fuse the representation of the image and the question with the retrieved images and texts. Experiments demonstrate that our retrieval-augmented pretrain-and-finetune paradigm obtains state-of-the-art performance on Med-VQA2019, Med-VQA2021, VQARAD, and SLAKE datasets. Further analysis shows that the proposed RAMM and PMCPM can enhance biomedical VQA performance compared with previous resources and methods. We will open-source our dataset, codes, and pretrained model.

machine learning, natural language, question answering, (19 more...)

arXiv.org Artificial Intelligence

Mar-1-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
  - Louisiana > Orleans Parish
    - New Orleans (0.04)
- Europe
  - Romania > București - Ilfov Development Region
    - Municipality of Bucharest > Bucharest (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
- Asia
  - Middle East > Israel (0.04)
  - China > Hong Kong (0.04)
  - Taiwan > Taiwan Province
    - Taipei (0.04)

Genre:
- Research Report (0.64)

Industry:
- Health & Medicine > Diagnostic Medicine > Imaging (0.94)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Machine Learning > Neural Networks (0.68)
  - Natural Language > Question Answering (0.62)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found