mimic-cxr
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.95)
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > Israel (0.04)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Health Care Technology (0.69)
- Health & Medicine > Nuclear Medicine (0.68)
Structure is Supervision: Multiview Masked Autoencoders for Radiology
Laguna, Sonia, Agostini, Andrea, Ryser, Alain, Ruiperez-Campillo, Samuel, Cannistraci, Irene, Vandenhirtz, Moritz, Mandt, Stephan, Deperrois, Nicolas, Nooralahzadeh, Farhad, Krauthammer, Michael, Sutter, Thomas M., Vogt, Julia E.
Building robust medical machine learning systems requires pretraining strategies that exploit the intrinsic structure present in clinical data. We introduce Multiview Masked Autoencoder (MVMAE), a self-supervised framework that leverages the natural multi-view organization of radiology studies to learn view-invariant and disease-relevant representations. MVMAE combines masked image reconstruction with cross-view alignment, transforming clinical redundancy across projections into a powerful self-supervisory signal. We further extend this approach with MVMAE-V2T, which incorporates radiology reports as an auxiliary text-based learning signal to enhance semantic grounding while preserving fully vision-based inference. Evaluated on a downstream disease classification task on three large-scale public datasets, MIMIC-CXR, CheXpert, and PadChest, MVMAE consistently outperforms supervised and vision-language baselines. Furthermore, MVMAE-V2T provides additional gains, particularly in low-label regimes where structured textual supervision is most beneficial. Together, these results establish the importance of structural and textual supervision as complementary paths toward scalable, clinically grounded medical foundation models.
- Europe > Switzerland > Zürich > Zürich (0.05)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- North America > United States (0.04)
- (3 more...)
- Research Report > Experimental Study (0.88)
- Research Report > New Finding (0.68)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
S2D-ALIGN: Shallow-to-Deep Auxiliary Learning for Anatomically-Grounded Radiology Report Generation
Gao, Jiechao, Liu, Chang, Li, Yuangang
Radiology Report Generation (RRG) aims to automatically generate diagnostic reports from radiology images. To achieve this, existing methods have leveraged the powerful cross-modal generation capabilities of Multimodal Large Language Models (MLLMs), primarily focusing on optimizing cross-modal alignment between radiographs and reports through Supervised Fine-Tuning (SFT). However, by only performing instance-level alignment with the image-text pairs, the standard SFT paradigm fails to establish anatomically-grounded alignment, where the templated nature of reports often leads to sub-optimal generation quality. To address this, we propose \textsc{S2D-Align}, a novel SFT paradigm that establishes anatomically-grounded alignment by leveraging auxiliary signals of varying granularities. \textsc{S2D-Align} implements a shallow-to-deep strategy, progressively enriching the alignment process: it begins with the coarse radiograph-report pairing, then introduces reference reports for instance-level guidance, and ultimately utilizes key phrases to ground the generation in specific anatomical details. To bridge the different alignment stages, we introduce a memory-based adapter that empowers feature sharing, thereby integrating coarse and fine-grained guidance. For evaluation, we conduct experiments on the public \textsc{MIMIC-CXR} and \textsc{IU X-Ray} benchmarks, where \textsc{S2D-Align} achieves state-of-the-art performance compared to existing methods. Ablation studies validate the effectiveness of our multi-stage, auxiliary-guided approach, highlighting a promising direction for enhancing grounding capabilities in complex, multi-modal generation tasks.
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Indiana (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (5 more...)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- North America > United States > Texas > Tarrant County > Arlington (0.04)
- Europe > United Kingdom > England > Surrey > Guildford (0.04)
- Asia > China > Sichuan Province > Chengdu (0.04)
- Instructional Material > Online (0.41)
- Instructional Material > Course Syllabus & Notes (0.41)
- Health & Medicine > Diagnostic Medicine > Imaging (0.84)
- Information Technology > Security & Privacy (0.69)
EditGRPO: Reinforcement Learning with Post-Rollout Edits for Clinically Accurate Chest X-Ray Report Generation
Zhang, Kai, Malon, Christopher, Sun, Lichao, Min, Martin Renqiang
Radiology report generation requires advanced medical image analysis, effective temporal reasoning, and accurate text generation. Although recent innovations, particularly multimodal large language models, have shown improved performance, their supervised fine-tuning (SFT) objective is not explicitly aligned with clinical efficacy. In this work, we introduce EditGRPO, a mixed-policy reinforcement learning algorithm designed specifically to optimize the generation through clinically motivated rewards. EditGRPO integrates on-policy exploration with off-policy guidance by injecting sentence-level detailed corrections during training rollouts. This mixed-policy approach addresses the exploration dilemma and sampling efficiency issues typically encountered in RL. Applied to a Qwen2.5-VL-3B, EditGRPO outperforms both SFT and vanilla GRPO baselines, achieving an average improvement of 3.4\% in clinical metrics across four major datasets. Notably, EditGRPO also demonstrates superior out-of-domain generalization, with an average performance gain of 5.9\% on unseen datasets.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Nuclear Medicine (0.91)
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > Israel (0.04)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Health Care Technology (0.69)
- Health & Medicine > Nuclear Medicine (0.68)
Limitations of Public Chest Radiography Datasets for Artificial Intelligence: Label Quality, Domain Shift, Bias and Evaluation Challenges
Rafferty, Amy, Ramaesh, Rishi, Rajan, Ajitha
Artificial intelligence has shown significant promise in chest radiography, where deep learning models can approach radiologist-level diagnostic performance. Progress has been accelerated by large public datasets such as MIMIC-CXR, ChestX-ray14, PadChest, and CheXpert, which provide hundreds of thousands of labelled images with pathology annotations. However, these datasets also present important limitations. Automated label extraction from radiology reports introduces errors, particularly in handling uncertainty and negation, and radiologist review frequently disagrees with assigned labels. In addition, domain shift and population bias restrict model generalisability, while evaluation practices often overlook clinically meaningful measures. We conduct a systematic analysis of these challenges, focusing on label quality, dataset bias, and domain shift. Our cross-dataset domain shift evaluation across multiple model architectures revealed substantial external performance degradation, with pronounced reductions in AUPRC and F1 scores relative to internal testing. To assess dataset bias, we trained a source-classification model that distinguished datasets with near-perfect accuracy, and performed subgroup analyses showing reduced performance for minority age and sex groups. Finally, expert review by two board-certified radiologists identified significant disagreement with public dataset labels. Our findings highlight important clinical weaknesses of current benchmarks and emphasise the need for clinician-validated datasets and fairer evaluation frameworks.
- Asia > Middle East > Israel (0.04)
- North America > United States > Maryland (0.04)
- Europe > Spain (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Research Report > New Finding (0.66)
- Research Report > Experimental Study (0.46)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
RoentMod: A Synthetic Chest X-Ray Modification Model to Identify and Correct Image Interpretation Model Shortcuts
Cooke, Lauren H., Jung, Matthias, Brendel, Jan M., Kerkovits, Nora M., Foldyna, Borek, Lu, Michael T., Raghu, Vineet K.
Chest radiographs (CXRs) are among the most common tests in medicine. Automated image interpretation may reduce radiologists\' workload and expand access to diagnostic expertise. Deep learning multi-task and foundation models have shown strong performance for CXR interpretation but are vulnerable to shortcut learning, where models rely on spurious and off-target correlations rather than clinically relevant features to make decisions. We introduce RoentMod, a counterfactual image editing framework that generates anatomically realistic CXRs with user-specified, synthetic pathology while preserving unrelated anatomical features of the original scan. RoentMod combines an open-source medical image generator (RoentGen) with an image-to-image modification model without requiring retraining. In reader studies with board-certified radiologists and radiology residents, RoentMod-produced images appeared realistic in 93\% of cases, correctly incorporated the specified finding in 89-99\% of cases, and preserved native anatomy comparable to real follow-up CXRs. Using RoentMod, we demonstrate that state-of-the-art multi-task and foundation models frequently exploit off-target pathology as shortcuts, limiting their specificity. Incorporating RoentMod-generated counterfactual images during training mitigated this vulnerability, improving model discrimination across multiple pathologies by 3-19\% AUC in internal validation and by 1-11\% for 5 out of 6 tested pathologies in external testing. These findings establish RoentMod as a broadly applicable tool for probing and correcting shortcut learning in medical AI. By enabling controlled counterfactual interventions, RoentMod enhances the robustness and interpretability of CXR interpretation models and provides a generalizable strategy for improving foundation models in medical imaging.
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (9 more...)
- North America > United States > Texas > Tarrant County > Arlington (0.04)
- Europe > United Kingdom > England > Surrey > Guildford (0.04)
- Asia > China > Sichuan Province > Chengdu (0.04)
- Instructional Material > Online (0.41)
- Instructional Material > Course Syllabus & Notes (0.41)
- Health & Medicine > Diagnostic Medicine > Imaging (0.84)
- Information Technology > Security & Privacy (0.69)