Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New Benchmark

Xu, Li, Liu, Bo, Khan, Ameer Hamza, Fan, Lu, Wu, Xiao-Ming

Aug-24-2023–arXiv.org Artificial Intelligence

With the availability of large-scale, comprehensive, and general-purpose vision-language (VL) datasets such as MSCOCO, vision-language pre-training (VLP) has become an active area of research and proven to be effective for various VL tasks such as visual-question answering. However, studies on VLP in the medical domain have so far been scanty. To provide a comprehensive perspective on VLP for medical VL tasks, we conduct a thorough experimental analysis to study key factors that may affect the performance of VLP with a unified vision-language Transformer. To allow making sound and quick pre-training decisions, we propose RadioGraphy Captions (RGC), a high-quality, multi-modality radiographic dataset containing 18,434 image-caption pairs collected from an open-access online database MedPix. RGC can be used as a pre-training dataset or a new benchmark for medical report generation and medical image-text retrieval. By utilizing RGC and other available datasets for pre-training, we develop several key insights that can guide future medical VLP research and new strong baselines for various medical VL tasks.

machine learning, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

Aug-24-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)
- Europe > France
  - Grand Est > Bas-Rhin > Strasbourg (0.04)
- Asia > China
  - Hong Kong (0.04)

Genre:
- Research Report > Experimental Study (0.46)

Industry:
- Health & Medicine
  - Therapeutic Area (1.00)
  - Nuclear Medicine (1.00)
  - Diagnostic Medicine > Imaging (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Question Answering (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found