Eyes on the Image: Gaze Supervised Multimodal Learning for Chest X-ray Diagnosis and Report Generation
Riju, Tanjim Islam, Anwar, Shuchismita, Joy, Saman Sarker, Sadeque, Farig, Shatabda, Swakkhar
–arXiv.org Artificial Intelligence
We propose a two-stage multimodal framework that enhances disease classification and region-aware radiology report generation from chest X-rays, leveraging the MIMIC-Eye dataset. In the first stage, we introduce a gaze-guided contrastive learning architecture for disease classification. It integrates visual features, clinical labels, bounding boxes, and radiologist eye-tracking signals and is equipped with a novel multi-term gaze-attention loss combining MSE, KL divergence, correlation, and center-of-mass alignment. Incorporating fixations improves F1 score from 0.597 to 0.631 (+5.70%) and AUC from 0.821 to 0.849 (+3.41%), while also improving precision and recall, highlighting the effectiveness of gaze-informed attention supervision. In the second stage, we present a modular report generation pipeline that extracts confidence-weighted diagnostic keywords, maps them to anatomical regions using a curated dictionary constructed from domain-specific priors, and generates region-aligned sentences via structured prompts. This pipeline improves report quality as measured by clinical keyword recall and ROUGE overlap. Our results demonstrate that integrating gaze data improves both classification performance and the interpretability of generated medical reports.
arXiv.org Artificial Intelligence
Aug-19-2025
- Country:
- Europe (1.00)
- North America > United States
- Minnesota (0.28)
- Genre:
- Research Report
- New Finding (0.68)
- Experimental Study (0.68)
- Research Report
- Industry:
- Health & Medicine
- Nuclear Medicine (1.00)
- Diagnostic Medicine > Imaging (1.00)
- Therapeutic Area (0.96)
- Health & Medicine
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Natural Language > Large Language Model (0.70)
- Machine Learning
- Neural Networks > Deep Learning (0.47)
- Performance Analysis > Accuracy (0.34)
- Information Technology > Artificial Intelligence