predictive value
Deep Learning-Based Regional White Matter Hyperintensity Mapping as a Robust Biomarker for Alzheimer's Disease
Machnio, Julia, Nielsen, Mads, Ghazi, Mostafa Mehdipour
White matter hyperintensities (WMH) are key imaging markers in cognitive aging, Alzheimer's disease (AD), and related dementias. Although automated methods for WMH segmentation have advanced, most provide only global lesion load and overlook their spatial distribution across distinct white matter regions. We propose a deep learning framework for robust WMH segmentation and localization, evaluated across public datasets and an independent Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. Our results show that the predicted lesion loads are in line with the reference WMH estimates, confirming the robustness to variations in lesion load, acquisition, and demographics. Beyond accurate segmentation, we quantify WMH load within anatomically defined regions and combine these measures with brain structure volumes to assess diagnostic value. Regional WMH volumes consistently outperform global lesion burden for disease classification, and integration with brain atrophy metrics further improves performance, reaching area under the curve (AUC) values up to 0.97. Several spatially distinct regions, particularly within anterior white matter tracts, are reproducibly associated with diagnostic status, indicating localized vulnerability in AD. These results highlight the added value of regional WMH quantification. Incorporating localized lesion metrics alongside atrophy markers may enhance early diagnosis and stratification in neurodegenerative disorders.
- Europe > Denmark > Capital Region > Copenhagen (0.05)
- North America > United States > New York (0.04)
- North America > United States > California (0.04)
- (2 more...)
Towards actionable hypotension prediction -- predicting catecholamine therapy initiation in the intensive care unit
Koebe, Richard, Saibel, Noah, Alcaraz, Juan Miguel Lopez, Schäfer, Simon, Strodthoff, Nils
Hypotension in critically ill ICU patients is common and life-threatening. Escalation to catecholamine therapy marks a key management step, with both undertreatment and overtreatment posing risks. Most machine learning (ML) models predict hypotension using fixed MAP thresholds or MAP forecasting, overlooking the clinical decision behind treatment escalation. Predicting catecholamine initiation, the start of vasoactive or inotropic agent administration offers a more clinically actionable target reflecting real decision-making. Using the MIMIC-III database, we modeled catecholamine initiation as a binary event within a 15-minute prediction window. Input features included statistical descriptors from a two-hour sliding MAP context window, along with demographics, biometrics, comorbidities, and ongoing treatments. An Extreme Gradient Boosting (XGBoost) model was trained and interpreted via SHapley Additive exPlanations (SHAP). The model achieved an AUROC of 0.822 (0.813-0.830), outperforming the hypotension baseline (MAP < 65, AUROC 0.686 [0.675-0.699]). SHAP analysis highlighted recent MAP values, MAP trends, and ongoing treatments (e.g., sedatives, electrolytes) as dominant predictors. Subgroup analysis showed higher performance in males, younger patients (<53 years), those with higher BMI (>32), and patients without comorbidities or concurrent medications. Predicting catecholamine initiation based on MAP dynamics, treatment context, and patient characteristics supports the critical decision of when to escalate therapy, shifting focus from threshold-based alarms to actionable decision support. This approach is feasible across a broad ICU cohort under natural event imbalance. Future work should enrich temporal and physiological context, extend label definitions to include therapy escalation, and benchmark against existing hypotension prediction systems.
- Europe > Germany > Lower Saxony (0.04)
- Asia > Middle East > Israel (0.04)
- North America > United States > Massachusetts (0.04)
- Asia > China > Hong Kong (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
Hierarchical Section Matching Prediction (HSMP) BERT for Fine-Grained Extraction of Structured Data from Hebrew Free-Text Radiology Reports in Crohn's Disease
Badash, Zvi, Ben-Atya, Hadas, Gavrielov, Naama, Hazan, Liam, Focht, Gili, Cytter-Kuint, Ruth, Hagopian, Talar, Turner, Dan, Freiman, Moti
Extracting structured clinical information from radiology reports is challenging, especially in low-resource languages. This is pronounced in Crohn's disease, with sparsely represented multi-organ findings. We developed Hierarchical Structured Matching Prediction BERT (HSMP-BERT), a prompt-based model for extraction from Hebrew radiology text. In an administrative database study, we analyzed 9,683 reports from Crohn's patients imaged 2010-2023 across Israeli providers. A subset of 512 reports was radiologist-annotated for findings across six gastrointestinal organs and 15 pathologies, yielding 90 structured labels per subject. Multilabel-stratified split (66% train+validation; 33% test), preserving label prevalence. Performance was evaluated with accuracy, F1, Cohen's $κ$, AUC, PPV, NPV, and recall. On 24 organ-finding combinations with $>$15 positives, HSMP-BERT achieved mean F1 0.83$\pm$0.08 and $κ$ 0.65$\pm$0.17, outperforming the SMP zero-shot baseline (F1 0.49$\pm$0.07, $κ$ 0.06$\pm$0.07) and standard fine-tuning (F1 0.30$\pm$0.27, $κ$ 0.27$\pm$0.34; paired t-test $p < 10^{-7}$). Hierarchical inference cuts runtime 5.1$\times$ vs. traditional inference. Applied to all reports, it revealed associations among ileal wall thickening, stenosis, and pre-stenotic dilatation, plus age- and sex-specific trends in inflammatory findings. HSMP-BERT offers a scalable solution for structured extraction in radiology, enabling population-level analysis of Crohn's disease and demonstrating AI's potential in low-resource settings.
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- Europe > Belgium > Flanders > West Flanders > Bruges (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Information Technology > Biomedical Informatics (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)
Interpretable Machine Learning Model for Early Prediction of Acute Kidney Injury in Critically Ill Patients with Cirrhosis: A Retrospective Study
Sun, Li, Chen, Shuheng, Fan, Junyi, Si, Yong, Ahmadi, Minoo, Pishgar, Elham, Alaei, Kamiar, Pishgar, Maryam
Background: Cirrhosis is a progressive liver disease with high mortality and frequent complications, notably acute kidney injury (AKI), which occurs in up to 50% of hospitalized patients and worsens outcomes. AKI stems from complex hemodynamic, inflammatory, and metabolic changes, making early detection essential. Many predictive tools lack accuracy, interpretability, and alignment with intensive care unit (ICU) workflows. This study developed an interpretable machine learning model for early AKI prediction in critically ill patients with cirrhosis. Methods: We conducted a retrospective analysis of the MIMIC-IV v2.2 database, identifying 1240 adult ICU patients with cirrhosis and excluding those with ICU stays under 48 hours or missing key data. Laboratory and physiological variables from the first 48 hours were extracted. The pipeline included preprocessing, missingness filtering, LASSO feature selection, and SMOTE class balancing. Six algorithms-LightGBM, CatBoost, XGBoost, logistic regression, naive Bayes, and neural networks-were trained and evaluated using AUROC, accuracy, F1-score, sensitivity, specificity, and predictive values. Results: LightGBM achieved the best performance (AUROC 0.808, 95% CI 0.741-0.856; accuracy 0.704; NPV 0.911). Key predictors included prolonged partial thromboplastin time, absence of outside-facility 20G placement, low pH, and altered pO2, consistent with known cirrhosis-AKI mechanisms and suggesting actionable targets. Conclusion: The LightGBM-based model enables accurate early AKI risk stratification in ICU patients with cirrhosis using routine clinical variables. Its high negative predictive value supports safe de-escalation for low-risk patients, and interpretability fosters clinician trust and targeted prevention. External validation and integration into electronic health record systems are warranted.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Massachusetts (0.04)
- Asia > Middle East > Israel (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
You Only Debias Once: Towards Flexible Accuracy-Fairness Trade-offs at Inference Time
Han, Xiaotian, Chen, Tianlong, Zhou, Kaixiong, Jiang, Zhimeng, Wang, Zhangyang, Hu, Xia
Deep neural networks are prone to various bias issues, jeopardizing their applications for high-stake decision-making. Existing fairness methods typically offer a fixed accuracy-fairness trade-off, since the weight of the well-trained model is a fixed point (fairness-optimum) in the weight space. Nevertheless, more flexible accuracy-fairness trade-offs at inference time are practically desired since: 1) stakes of the same downstream task can vary for different individuals, and 2) different regions have diverse laws or regularization for fairness. If using the previous fairness methods, we have to train multiple models, each offering a specific level of accuracy-fairness trade-off. This is often computationally expensive, time-consuming, and difficult to deploy, making it less practical for real-world applications. To address this problem, we propose You Only Debias Once (YODO) to achieve in-situ flexible accuracy-fairness trade-offs at inference time, using a single model that trained only once. Instead of pursuing one individual fixed point (fairness-optimum) in the weight space, we aim to find a "line" in the weight space that connects the accuracy-optimum and fairness-optimum points using a single model. Points (models) on this line implement varying levels of accuracy-fairness trade-offs. At inference time, by manually selecting the specific position of the learned "line", our proposed method can achieve arbitrary accuracy-fairness trade-offs for different end-users and scenarios. Experimental results on tabular and image datasets show that YODO achieves flexible trade-offs between model accuracy and fairness, at ultra-low overheads. For example, if we need $100$ levels of trade-off on the \acse dataset, YODO takes $3.53$ seconds while training $100$ fixed models consumes $425$ seconds. The code is available at https://github.com/ahxt/yodo.
- North America > United States > North Carolina (0.04)
- Europe > Germany (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (2 more...)
Interactive Classification Metrics: A graphical application to build robust intuition for classification model evaluation
Brown, David H., Chicco, Davide
Machine learning continues to grow in popularity in academia, in industry, and is increasingly used in other fields. However, most of the common metrics used to evaluate even simple binary classification models have shortcomings that are neither immediately obvious nor consistently taught to practitioners. Here we present Interactive Classification Metrics (ICM), an application to visualize and explore the relationships between different evaluation metrics. The user changes the distribution statistics and explores corresponding changes across a suite of evaluation metrics. The interactive, graphical nature of this tool emphasizes the tradeoffs of each metric without the overhead of data wrangling and model training. The goals of this application are: (1) to aid practitioners in the ever-expanding machine learning field to choose the most appropriate evaluation metrics for their classification problem; (2) to promote careful attention to interpretation that is required even in the simplest scenarios like binary classification. Our application is publicly available for free under the MIT license as a Python package on PyPI at https://pypi.org/project/interactive-classification-metrics and on GitHub at https://github.com/davhbrown/interactive_classification_metrics.
- North America > Canada > Ontario > Toronto (0.15)
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > Italy > Lombardy > Milan (0.04)
- Health & Medicine (0.70)
- Government > Regional Government (0.69)
The Tile: A 2D Map of Ranking Scores for Two-Class Classification
Piérard, Sébastien, Halin, Anaïs, Cioppa, Anthony, Deliège, Adrien, Van Droogenbroeck, Marc
In the computer vision and machine learning communities, as well as in many other research domains, rigorous evaluation of any new method, including classifiers, is essential. One key component of the evaluation process is the ability to compare and rank methods. However, ranking classifiers and accurately comparing their performances, especially when taking application-specific preferences into account, remains challenging. For instance, commonly used evaluation tools like Receiver Operating Characteristic (ROC) and Precision/Recall (PR) spaces display performances based on two scores. Hence, they are inherently limited in their ability to compare classifiers across a broader range of scores and lack the capability to establish a clear ranking among classifiers. In this paper, we present a novel versatile tool, named the Tile, that organizes an infinity of ranking scores in a single 2D map for two-class classifiers, including common evaluation scores such as the accuracy, the true positive rate, the positive predictive value, Jaccard's coefficient, and all F-beta scores. Furthermore, we study the properties of the underlying ranking scores, such as the influence of the priors or the correspondences with the ROC space, and depict how to characterize any other score by comparing them to the Tile. Overall, we demonstrate that the Tile is a powerful tool that effectively captures all the rankings in a single visualization and allows interpreting them.
- North America > United States (0.04)
- Europe > Belgium > Wallonia > Liège Province > Liège (0.04)
- Asia > Middle East > Republic of Türkiye > Antalya Province > Antalya (0.04)
- (2 more...)
An Explainable AI Model for Predicting the Recurrence of Differentiated Thyroid Cancer
Ahmad, Mohammad Al-Sayed, Haddad, Jude
Thyroid carcinoma, a significant yet often controllable cancer, has seen a rise in cases, largely due to advancements in diagnostic methods. Differentiated thyroid cancer (DTC), which includes papillary and follicular varieties, is typically associated with a positive prognosis in academic circles. Nevertheless, there are still some individuals who may experience a recurrence. This study employs machine learning, particularly deep learning models, to predict the recurrence of DTC, with the goal of improving patient care through personalized treatment approaches. By analysing a dataset containing clinicopathological features of patients, the model achieved remarkable accuracy rates of 98% during training and 96% during testing. To improve the model's interpretability, we used techniques like LIME and Morris Sensitivity Analysis. These methods gave us valuable insights into how the model makes decisions. The results suggest that combining deep learning models with interpretability techniques can be extremely useful in quickly identifying the recurrence of thyroid cancer in patients. This can help in making informed therapeutic choices and customizing treatment approaches for individual patients.
- Asia > Middle East > Jordan (0.05)
- North America > United States (0.04)
- Research Report > Experimental Study (0.47)
- Research Report > New Finding (0.34)
- Health & Medicine > Therapeutic Area > Oncology > Thyroid Cancer (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology (1.00)
Forecasting mortality associated emergency department crowding
Nevanlinna, Jalmari, Eidstø, Anna, Ylä-Mattila, Jari, Koivistoinen, Teemu, Oksala, Niku, Kanniainen, Juho, Palomäki, Ari, Roine, Antti
Emergency department (ED) crowding is a global public health issue that has been repeatedly associated with increased mortality. Predicting future service demand would enable preventative measures aiming to eliminate crowding along with it's detrimental effects. Recent findings in our ED indicate that occupancy ratios exceeding 90% are associated with increased 10-day mortality. In this paper, we aim to predict these crisis periods using retrospective data from a large Nordic ED with a LightGBM model. We provide predictions for the whole ED and individually for it's different operational sections. We demonstrate that afternoon crowding can be predicted at 11 a.m. with an AUC of 0.82 (95% CI 0.78-0.86) and at 8 a.m. with an AUC up to 0.79 (95% CI 0.75-0.83). Consequently we show that forecasting mortality-associated crowding using anonymous administrative data is feasible.
- Europe > Finland > Pirkanmaa > Tampere (0.05)
- Oceania > Australia (0.04)
- North America > Canada > Ontario (0.04)
- Europe > Finland > Tavastia Proper > Hämeenlinna (0.04)
Classification performance and reproducibility of GPT-4 omni for information extraction from veterinary electronic health records
Wulcan, Judit M, Jacques, Kevin L, Lee, Mary Ann, Kovacs, Samantha L, Dausend, Nicole, Prince, Lauren E, Wulcan, Jonatan, Marsilio, Sina, Keller, Stefan M
Large language models (LLMs) can extract information from veterinary electronic health records (EHRs), but performance differences between models, the effect of temperature settings, and the influence of text ambiguity have not been previously evaluated. This study addresses these gaps by comparing the performance of GPT-4 omni (GPT-4o) and GPT-3.5 Turbo under different conditions and investigating the relationship between human interobserver agreement and LLM errors. The LLMs and five humans were tasked with identifying six clinical signs associated with Feline chronic enteropathy in 250 EHRs from a veterinary referral hospital. At temperature 0, the performance of GPT-4o compared to the majority opinion of human respondents, achieved 96.9% sensitivity (interquartile range [IQR] 92.9-99.3%), 97.6% specificity (IQR 96.5-98.5%), 80.7% positive predictive value (IQR 70.8-84.6%), 99.5% negative predictive value (IQR 99.0-99.9%), 84.4% F1 score (IQR 77.3-90.4%), and 96.3% balanced accuracy (IQR 95.0-97.9%). The performance of GPT-4o was significantly better than that of its predecessor, GPT-3.5 Turbo, particularly with respect to sensitivity where GPT-3.5 Turbo only achieved 81.7% (IQR 78.9-84.8%). Adjusting the temperature for GPT-4o did not significantly impact classification performance. GPT-4o demonstrated greater reproducibility than human pairs regardless of temperature, with an average Cohen's kappa of 0.98 (IQR 0.98-0.99) at temperature 0 compared to 0.8 (IQR 0.78-0.81) for humans. Most GPT-4o errors occurred in instances where humans disagreed (35/43 errors, 81.4%), suggesting that these errors were more likely caused by ambiguity of the EHR than explicit model faults. Using GPT-4o to automate information extraction from veterinary EHRs is a viable alternative to manual extraction.
- North America > United States > California > Yolo County > Davis (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Austria > Vienna (0.14)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)