Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation
Zhang, Xi, Meng, Zaiqiao, Lever, Jake, Ho, Edmond S. L.
–arXiv.org Artificial Intelligence
We introduce a radiology-focused visual language model designed to generate radiology reports from chest X-rays. Building on previous findings that large language models (LLMs) can acquire multimodal capabilities when aligned with pretrained vision encoders, we demonstrate similar potential with chest X-ray images. This integration enhances the ability of model to understand and describe chest X-ray images. Our model combines an image encoder with a fine-tuned LLM based on the Vicuna-7B architecture, enabling it to generate different sections of a radiology report with notable accuracy. The training process involves a two-stage approach: (i) initial alignment of chest X-ray features with the LLM (ii) followed by fine-tuning for radiology report generation.
arXiv.org Artificial Intelligence
Dec-6-2024
- Genre:
- Research Report (1.00)
- Industry:
- Health & Medicine
- Diagnostic Medicine > Imaging (1.00)
- Nuclear Medicine (1.00)
- Health & Medicine
- Technology: