calcification
Evaluating Generative AI as an Educational Tool for Radiology Resident Report Drafting
Verdone, Antonio, Cardall, Aidan, Siddiqui, Fardeen, Nashawaty, Motaz, Rigau, Danielle, Kwon, Youngjoon, Yousef, Mira, Patel, Shalin, Kieturakis, Alex, Kim, Eric, Heacock, Laura, Reig, Beatriu, Shen, Yiqiu
Objective: Radiology residents require timely, personalized feedback to develop accurate image analysis and reporting skills. Increasing clinical workload often limits attendings' ability to provide guidance. This study evaluates a HIPAA-compliant GPT-4o system that delivers automated feedback on breast imaging reports drafted by residents in real clinical settings. Methods: We analyzed 5,000 resident-attending report pairs from routine practice at a multi-site U.S. health system. GPT-4o was prompted with clinical instructions to identify common errors and provide feedback. A reader study using 100 report pairs was conducted. Four attending radiologists and four residents independently reviewed each pair, determined whether predefined error types were present, and rated GPT-4o's feedback as helpful or not. Agreement between GPT and readers was assessed using percent match. Inter-reader reliability was measured with Krippendorff's alpha. Educational value was measured as the proportion of cases rated helpful. Results: Three common error types were identified: (1) omission or addition of key findings, (2) incorrect use or omission of technical descriptors, and (3) final assessment inconsistent with findings. GPT-4o showed strong agreement with attending consensus: 90.5%, 78.3%, and 90.4% across error types. Inter-reader reliability showed moderate variability (α = 0.767, 0.595, 0.567), and replacing a human reader with GPT-4o did not significantly affect agreement (Δ = -0.004 to 0.002). GPT's feedback was rated helpful in most cases: 89.8%, 83.0%, and 92.0%. Discussion: ChatGPT-4o can reliably identify key educational errors. It may serve as a scalable tool to support radiology education.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Ventura County > Thousand Oaks (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.68)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
MammoClean: Toward Reproducible and Bias-Aware AI in Mammography through Dataset Harmonization
Zafari, Yalda, Pan, Hongyi, Durak, Gorkem, Bagci, Ulas, Rashed, Essam A., Mabrok, Mohamed
The development of clinically reliable artificial intelligence (AI) systems for mammography is hindered by profound heterogeneity in data quality, metadata standards, and population distributions across public datasets. This heterogeneity introduces dataset-specific biases that severely compromise the generalizability of the model, a fundamental barrier to clinical deployment. We present MammoClean, a public framework for standardization and bias quantification in mammography datasets. MammoClean standardizes case selection, image processing (including laterality and intensity correction), and unifies metadata into a consistent multi-view structure. We provide a comprehensive review of breast anatomy, imaging characteristics, and public mammography datasets to systematically identify key sources of bias. Applying MammoClean to three heterogeneous datasets (CBIS-DDSM, TOMPEI-CMMD, VinDr-Mammo), we quantify substantial distributional shifts in breast density and abnormality prevalence. Critically, we demonstrate the direct impact of data corruption: AI models trained on corrupted datasets exhibit significant performance degradation compared to their curated counterparts. By using MammoClean to identify and mitigate bias sources, researchers can construct unified multi-dataset training corpora that enable development of robust models with superior cross-domain generalization. MammoClean provides an essential, reproducible pipeline for bias-aware AI development in mammography, facilitating fairer comparisons and advancing the creation of safe, effective systems that perform equitably across diverse patient populations and clinical settings. The open-source code is publicly available from: https://github.com/Minds-R-Lab/MammoClean.
- Europe > United Kingdom (0.04)
- Europe > Sweden (0.04)
- Oceania > Australia (0.04)
- (12 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Research Report > Experimental Study (0.93)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Predicting Chest Radiograph Findings from Electrocardiograms Using Interpretable Machine Learning
Matejas, Julia, Żurawski, Olaf, Strodthoff, Nils, Alcaraz, Juan Miguel Lopez
Purpose: Chest X-rays are essential for diagnosing pulmonary conditions, but limited access in resource-constrained settings can delay timely diagnosis. Electrocardiograms (ECGs), in contrast, are widely available, non-invasive, and often acquired earlier in clinical workflows. This study aims to assess whether ECG features and patient demographics can predict chest radiograph findings using an interpretable machine learning approach. Methods: Using the MIMIC-IV database, Extreme Gradient Boosting (XGBoost) classifiers were trained to predict diverse chest radiograph findings from ECG-derived features and demographic variables. Recursive feature elimination was performed independently for each target to identify the most predictive features. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC) with bootstrapped 95% confidence intervals. Shapley Additive Explanations (SHAP) were applied to interpret feature contributions. Results: Models successfully predicted multiple chest radiograph findings with varying accuracy. Feature selection tailored predictors to each target, and including demographic variables consistently improved performance. SHAP analysis revealed clinically meaningful contributions from ECG features to radiographic predictions. Conclusion: ECG-derived features combined with patient demographics can serve as a proxy for certain chest radiograph findings, enabling early triage or pre-screening in settings where radiographic imaging is limited. Interpretable machine learning demonstrates potential to support radiology workflows and improve patient care.
- North America > United States > Massachusetts (0.04)
- Europe > Germany > Lower Saxony > Oldenburg (0.04)
- Asia > Middle East > Israel (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.88)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Examining Deployment and Refinement of the VIOLA-AI Intracranial Hemorrhage Model Using an Interactive NeoMedSys Platform
Liu, Qinghui, Nesvold, Jon E., Raaum, Hanna, Murugesu, Elakkyen, Røvang, Martin, Maclntosh, Bradley J, Bjørnerud, Atle, Skogen, Karoline
Background: There are many challenges and opportunities in the clinical deployment of AI tools in radiology. The current study describes a radiology software platform called NeoMedSys that can enable efficient deployment and refinements of AI models. We evaluated the feasibility and effectiveness of running NeoMedSys for three months in real-world clinical settings and focused on improvement performance of an in-house developed AI model (VIOLA-AI) designed for intracranial hemorrhage (ICH) detection. Methods: NeoMedSys integrates tools for deploying, testing, and optimizing AI models with a web-based medical image viewer, annotation system, and hospital-wide radiology information systems. A prospective pragmatic investigation was deployed using clinical cases of patients presenting to the largest Emergency Department in Norway (site-1) with suspected traumatic brain injury (TBI) or patients with suspected stroke (site-2). We assessed ICH classification performance as VIOLA-AI encountered new data and underwent pre-planned model retraining. Performance metrics included sensitivity, specificity, accuracy, and the area under the receiver operating characteristic curve (AUC). Results: NeoMedSys facilitated iterative improvements in the AI model, significantly enhancing its diagnostic accuracy. Automated bleed detection and segmentation were reviewed in near real-time to facilitate re-training VIOLA-AI. The iterative refinement process yielded a marked improvement in classification sensitivity, rising to 90.3% (from 79.2%), and specificity that reached 89.3% (from 80.7%). The bleed detection ROC analysis for the entire sample demonstrated a high area-under-the-curve (AUC) of 0.949 (from 0.873). Model refinement stages were associated with notable gains, highlighting the value of real-time radiologist feedback.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Norway > Eastern Norway > Oslo (0.05)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.68)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Therapeutic Area > Hematology (1.00)
- Health & Medicine > Nuclear Medicine (1.00)
- (2 more...)
Is ChatGPT-5 Ready for Mammogram VQA?
Li, Qiang, Wang, Shansong, Hu, Mingzhe, Safari, Mojtaba, Eidex, Zachary, Yang, Xiaofeng
Mammogram visual question answering (VQA) integrates image interpretation with clinical reasoning and has potential to support breast cancer screening. We systematically evaluated the GPT-5 family and GPT-4o model on four public mammography datasets (EMBED, InBreast, CMMD, CBIS-DDSM) for BI-RADS assessment, abnormality detection, and malignancy classification tasks. GPT-5 consistently was the best performing model but lagged behind both human experts and domain-specific fine-tuned models. On EMBED, GPT-5 achieved the highest scores among GPT variants in density (56.8%), distortion (52.5%), mass (64.5%), calcification (63.5%), and malignancy (52.8%) classification. On InBreast, it attained 36.9% BI-RADS accuracy, 45.9% abnormality detection, and 35.0% malignancy classification. On CMMD, GPT-5 reached 32.3% abnormality detection and 55.0% malignancy accuracy. On CBIS-DDSM, it achieved 69.3% BI-RADS accuracy, 66.0% abnormality detection, and 58.2% malignancy accuracy. Compared with human expert estimations, GPT-5 exhibited lower sensitivity (63.5%) and specificity (52.3%). While GPT-5 exhibits promising capabilities for screening tasks, its performance remains insufficient for high-stakes clinical imaging applications without targeted domain adaptation and optimization. However, the tremendous improvements in performance from GPT-4o to GPT-5 show a promising trend in the potential for general large language models (LLMs) to assist with mammography VQA tasks.
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.81)
On the effectiveness of multimodal privileged knowledge distillation in two vision transformer based diagnostic applications
Baur, Simon, Benova, Alexandra, Cantú, Emilio Dolgener, Ma, Jackie
Deploying deep learning models in clinical practice often requires leveraging multiple data modalities, such as images, text, and structured data, to achieve robust and trustworthy decisions. However, not all modalities are always available at inference time. In this work, we propose multimodal privileged knowledge distillation (MMPKD), a training strategy that utilizes additional modalities available solely during training to guide a unimodal vision model. Specifically, we used a text-based teacher model for chest radiographs (MIMIC-CXR) and a tabular metadata-based teacher model for mammography (CBIS-DDSM) to distill knowledge into a vision transformer student model. We show that MMPKD can improve the resulting attention maps' zero-shot capabilities of localizing ROI in input images, while this effect does not generalize across domains, as contrarily suggested by prior research.
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.38)
Deep Learning-Based Breast Cancer Detection in Mammography: A Multi-Center Validation Study in Thai Population
Chamveha, Isarun, Chaiyungyuen, Supphanut, Worakriangkrai, Sasinun, Prasawang, Nattawadee, Chaisangmongkon, Warasinee, Korpraphong, Pornpim, Suvannarerg, Voraparee, Thiravit, Shanigarn, Kannawat, Chalermdej, Rungsinaporn, Kewalin, Issaragrisil, Suwara, Chadbunchachai, Payia, Gatechumpol, Pattiya, Muktabhant, Chawiporn, Sereerat, Patarachai
This study presents a deep learning system for breast cancer detection in mammography, developed using a modified EfficientNetV2 architecture with enhanced attention mechanisms. The model was trained on mammograms from a major Thai medical center and validated on three distinct datasets: an in-domain test set (9,421 cases), a biopsy-confirmed set (883 cases), and an out-of-domain generalizability set (761 cases) collected from two different hospitals. For cancer detection, the model achieved AUROCs of 0.89, 0.96, and 0.94 on the respective datasets. The system's lesion localization capability, evaluated using metrics including Lesion Localization Fraction (LLF) and Non-Lesion Localization Fraction (NLF), demonstrated robust performance in identifying suspicious regions. Clinical validation through concordance tests showed strong agreement with radiologists: 83.5% classification and 84.0% localization concordance for biopsy-confirmed cases, and 78.1% classification and 79.6% localization concordance for out-of-domain cases. Expert radiologists' acceptance rate also averaged 96.7% for biopsy-confirmed cases, and 89.3% for out-of-domain cases. The system achieved a System Usability Scale score of 74.17 for source hospital, and 69.20 for validation hospitals, indicating good clinical acceptance. These results demonstrate the model's effectiveness in assisting mammogram interpretation, with the potential to enhance breast cancer screening workflows in clinical practice.
- Asia > Thailand > Bangkok > Bangkok (0.05)
- North America > United States > Maryland > Montgomery County > Silver Spring (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Enhancing Coronary Artery Calcium Scoring via Multi-Organ Segmentation on Non-Contrast Cardiac Computed Tomography
Nalepa, Jakub, Bartczak, Tomasz, Bujny, Mariusz, Gośliński, Jarosław, Jesionek, Katarzyna, Malara, Wojciech, Malawski, Filip, Miszalski-Jamka, Karol, Rewa, Patrycja, Kostur, Marcin
Despite coronary artery calcium scoring being considered a largely solved problem within the realm of medical artificial intelligence, this paper argues that significant improvements can still be made. By shifting the focus from pathology detection to a deeper understanding of anatomy, the novel algorithm proposed in the paper both achieves high accuracy in coronary artery calcium scoring and offers enhanced interpretability of the results. This approach not only aids in the precise quantification of calcifications in coronary arteries, but also provides valuable insights into the underlying anatomical structures. Through this anatomically-informed methodology, the paper shows how a nuanced understanding of the heart's anatomy can lead to more accurate and interpretable results in the field of cardiovascular health. We demonstrate the superior accuracy of the proposed method by evaluating it on an open-source multi-vendor dataset, where we obtain results at the inter-observer level, surpassing the current state of the art. Finally, the qualitative analyses show the practical value of the algorithm in such tasks as labeling coronary artery calcifications, identifying aortic calcifications, and filtering out false positive detections due to noise.
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.46)
DINO-LG: A Task-Specific DINO Model for Coronary Calcium Scoring
Gokmen, Mahmut S., Ozcan, Caner, Haque, Moneera N., Leung, Steve W., Parker, C. Seth, Seales, W. Brent, Bumgardner, Cody
Coronary artery disease (CAD), one of the leading causes of mortality worldwide, necessitates effective risk assessment strategies, with coronary artery calcium (CAC) scoring via computed tomography (CT) being a key method for prevention. Traditional methods, primarily based on UNET architectures implemented on pre-built models, face challenges like the scarcity of annotated CT scans containing CAC and imbalanced datasets, leading to reduced performance in segmentation and scoring tasks. In this study, we address these limitations by incorporating the self-supervised learning (SSL) technique of DINO (self-distillation with no labels), which trains without requiring CAC-specific annotations, enhancing its robustness in generating distinct features. The DINO-LG model, which leverages label guidance to focus on calcified areas, achieves significant improvements, with a sensitivity of 89% and specificity of 90% for detecting CAC-containing CT slices, compared to the standard DINO model's sensitivity of 79% and specificity of 77%. Additionally, false-negative and false-positive rates are reduced by 49% and 59%, respectively, instilling greater confidence in clinicians when ruling out calcification in low-risk patients and minimizing unnecessary imaging reviews by radiologists. Further, CAC scoring and segmentation tasks are conducted using a basic UNET architecture, applied specifically to CT slices identified by the DINO-LG model as containing calcified areas. This targeted approach enhances CAC scoring accuracy by feeding the UNET model with relevant slices, significantly improving diagnostic precision, reducing both false positives and false negatives, and ultimately lowering overall healthcare costs by minimizing unnecessary tests and treatments, presenting a valuable advancement in CAD risk assessment.
- Europe > Netherlands > Drenthe > Assen (0.04)
- North America > United States > Michigan (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- (4 more...)
Full Field Digital Mammography Dataset from a Population Screening Program
Kendall, Edward, Hajishafiezahramini, Paraham, Hamilton, Matthew, Doyle, Gregory, Wadden, Nancy, Meruvia-Pastor, Oscar
Breast cancer presents the second largest cancer risk in the world to women. Early detection of cancer has been shown to be effective in reducing mortality. Population screening programs schedule regular mammography imaging for participants, promoting early detection. Currently, such screening programs require manual reading. False-positive errors in the reading process unnecessarily leads to costly follow-up and patient anxiety. Automated methods promise to provide more efficient, consistent and effective reading. To facilitate their development, a number of datasets have been created. With the aim of specifically targeting population screening programs, we introduce NL-Breast-Screening, a dataset from a Canadian provincial screening program. The dataset consists of 5997 mammography exams, each of which has four standard views and is biopsy-confirmed. Cases where radiologist reading was a false-positive are identified. NL-Breast is made publicly available as a new resource to promote advances in automation for population screening programs.
- North America > Canada > Newfoundland and Labrador > Newfoundland > St. John's (0.05)
- North America > Canada > Newfoundland and Labrador > Labrador (0.05)
- Oceania > Australia (0.04)
- (5 more...)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)