RadVLM: A Multitask Conversational Vision-Language Model for Radiology

Deperrois, Nicolas, Matsuo, Hidetoshi, Ruipérez-Campillo, Samuel, Vandenhirtz, Moritz, Laguna, Sonia, Ryser, Alain, Fujimoto, Koji, Nishio, Mizuho, Sutter, Thomas M., Vogt, Julia E., Kluckert, Jonas, Frauenfelder, Thomas, Blüthgen, Christian, Nooralahzadeh, Farhad, Krauthammer, Michael

arXiv.org Artificial Intelligence 

X-rays have played a fundamental role in medicine since their discovery in 1895 (Röntgen, 1895), and continue to be the most frequently used medical imaging modality worldwide due to their convenience and cost-effectiveness (Akhter et al., 2023). Chest X-ray (CXR) remains the most commonly performed radiological exam globally, particularly important for diagnosing and monitoring thoracic conditions such as pneumonia, heart failure, and lung cancer (Çallı et al., 2021). Problematically, the growing volume of CXRs and other imaging studies in recent years have lead to a reduction in the time available for radiologists to thoroughly evaluate each case (Peng et al., 2022). As a result, in many countries, the responsibility of interpreting CXRs is often transferred to non-radiology physicians, who typically possess less specialized training and experience. This shift increases the risk of diagnostic errors or misinterpretations (Shammari et al., 2021; Peng et al., 2022). The shortage of trained personnel for CXR interpretation has led to the exploration of automated agents to assist physicians in diagnostic tasks. In recent years, various deep learning models have shown promise in clinical applications, such as the detection of conditions like COVID-19 pneumonia (Nishio et al., 2020) or pulmonary nodules (Homayounieh et al., 2021). Another extensively studied task is the automated generation of free text reports from CXR images using transformer-based architectures (Nooralahzadeh et al., 2021; Yang et al., 2023; Hyland et al., 2023; Chaves et al., 2024). These models can provide preliminary drafts summarizing key observations from the CXR, offering a potential enhancement to the diagnostic workflow.