Regional Attention-Enhanced Swin Transformer for Clinically Relevant Medical Image Captioning

Naz, Zubia, Asghar, Farhan, Hussain, Muhammad Ishfaq, Hadadi, Yahya, Rafique, Muhammad Aasim, Choi, Wookjin, Jeon, Moongu

Nov-14-2025–arXiv.org Artificial Intelligence

Automated medical image captioning translates complex radiological images into diagnostic narratives that can support reporting workflows. We present a Swin-BART encoder-decoder system with a lightweight regional attention module that amplifies diagnostically salient regions before cross-attention. Trained and evaluated on ROCO, our model achieves state-of-the-art semantic fidelity while remaining compact and interpretable. We report results as mean$\pm$std over three seeds and include $95\%$ confidence intervals. Compared with baselines, our approach improves ROUGE (proposed 0.603, ResNet-CNN 0.356, BLIP2-OPT 0.255) and BERTScore (proposed 0.807, BLIP2-OPT 0.645, ResNet-CNN 0.623), with competitive BLEU, CIDEr, and METEOR. We further provide ablations (regional attention on/off and token-count sweep), per-modality analysis (CT/MRI/X-ray), paired significance tests, and qualitative heatmaps that visualize the regions driving each description. Decoding uses beam search (beam size $=4$), length penalty $=1.1$, $no\_repeat\_ngram\_size$ $=3$, and max length $=128$. The proposed design yields accurate, clinically phrased captions and transparent regional attributions, supporting safe research use with a human in the loop.

caption, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Nov-14-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - South Korea (0.15)
  - Middle East > Saudi Arabia (0.14)

Genre:
- Research Report > Experimental Study (0.48)

Industry:
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Machine Learning > Neural Networks (0.68)
  - Natural Language > Large Language Model (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found