Process Reward Models for Sentence-Level Verification of LVLM Radiology Reports
Thomas, Alois, Varma, Maya, Delbrouck, Jean-Benoit, Langlotz, Curtis P.
–arXiv.org Artificial Intelligence
Automating radiology report generation with Large Vision-Language Models (LVLMs) holds great potential, yet these models often produce clinically critical hallucinations, posing serious risks. Existing hallucination detection methods frequently lack the necessary sentence-level granularity or robust generalization across different LVLM generators. We introduce a novel approach: a sentence-level Process Reward Model (PRM) adapted for this vision-language task. Our PRM predicts the factual correctness of each generated sentence, conditioned on clinical context and preceding text. When fine-tuned on MIMIC-CXR with weakly-supervised labels, a lightweight 0.5B-parameter PRM outperforms existing verification techniques, demonstrating, for instance, relative improvements of 7.5% in Matthews Correlation Coefficient and 1.8% in AUROC over strong white-box baselines on outputs from one LVLM. Unlike methods reliant on internal model states, our PRM demonstrates strong generalization to an unseen LVLM. We further show its practical utility: PRM scores effectively filter low-quality reports, improving F1-CheXbert scores by 4.5% (when discarding the worst 10% of reports). Moreover, when guiding a novel weighted best-of-N selection process on the MIMIC-CXR test set, our PRM show relative improvements in clinical metrics of 7.4% for F1-CheXbert and 0.6% for BERTScore. These results demonstrate that a lightweight, context-aware PRM provides a model-agnostic safety layer for clinical LVLMs without access to internal activations
arXiv.org Artificial Intelligence
Oct-28-2025
- Country:
- Asia > Thailand
- Europe
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Italy > Calabria
- North America > United States
- California > Santa Clara County > Palo Alto (0.04)
- Genre:
- Research Report
- Experimental Study (0.46)
- New Finding (0.66)
- Research Report
- Industry:
- Health & Medicine
- Diagnostic Medicine > Imaging (1.00)
- Nuclear Medicine (1.00)
- Health & Medicine
- Technology: