SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models

Kapadnis, Manav Nitin, Patnaik, Sohan, Nandy, Abhilash, Ray, Sourjyadip, Goyal, Pawan, Sheet, Debdoot

Apr-27-2024–arXiv.org Artificial Intelligence

Radiology Report Generation (R2Gen) demonstrates how Multi-modal Large Language Models (MLLMs) can automate the creation of accurate and coherent radiological reports. Existing methods often hallucinate details in text-based reports that don't accurately reflect the image content. To mitigate this, we introduce a novel strategy, SERPENT-VLM (SElf Refining Radiology RePort GENeraTion using Vision Language Models), which improves the R2Gen task by integrating a self-refining mechanism into the MLLM framework. We employ a unique self-supervised loss that leverages similarity between pooled image representations and the contextual representations of the generated radiological text, alongside the standard Causal Language Modeling objective, to refine image-text representations. This allows the model to scrutinize and align the generated text through dynamic interaction between a given image and the generated text, therefore reducing hallucination and continuously enhancing nuanced report generation. SERPENT-VLM outperforms existing baselines such as LLaVA-Med, BiomedGPT, etc., achieving SoTA performance on the IU X-ray and Radiology Objects in COntext (ROCO) datasets, and also proves to be robust against noisy images. A qualitative case study emphasizes the significant advancements towards more sophisticated MLLM frameworks for R2Gen, opening paths for further research into self-supervised refinement in the medical imaging domain.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Apr-27-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.46)

Genre:
- Research Report (0.82)

Industry:
- Health & Medicine
  - Diagnostic Medicine > Imaging (1.00)
  - Nuclear Medicine (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Natural Language > Large Language Model (1.00)
  - Vision (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found