AITopics | net benefit

Collaborating Authors

net benefit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Evaluating classification performance across operating contexts: A comparison of decision curve analysis and cost curves

Millard, Louise AC, Flach, Peter A

arXiv.org Machine LearningSep-30-2025

Classification models typically predict a score and use a decision threshold to produce a classification. Appropriate model evaluation should carefully consider the context in which a model will be used, including the relative value of correct classifications of positive versus negative examples, which affects the threshold that should be used. Decision curve analysis (DCA) and cost curves are model evaluation approaches that assess the expected utility and expected loss of prediction models, respectively, across decision thresholds. We compared DCA and cost curves to determine how they are related, and their strengths and limitations. We demonstrate that decision curves are closely related to a specific type of cost curve called a Brier curve. Both curves are derived assuming model scores are calibrated and setting the classification threshold using the relative value of correct positive and negative classifications, and the x-axis of both curves are equivalent. Net benefit (used for DCA) and Brier loss (used for Brier curves) will always choose the same model as optimal at any given threshold. Across thresholds, differences in Brier loss are comparable whereas differences in net benefit cannot be compared. Brier curves are more generally applicable (when a wider range of thresholds are plausible), and the area under the Brier curve is the Brier score. We demonstrate that reference lines common in each space can be included in either and suggest the upper envelope decision curve as a useful comparison for DCA showing the possible gain in net benefit that could be achieved through recalibration alone.

brier curve, cost curve, threshold, (15 more...)

arXiv.org Machine Learning

2509.24608

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > Bristol (0.04)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Predicting Chest Radiograph Findings from Electrocardiograms Using Interpretable Machine Learning

Matejas, Julia, Żurawski, Olaf, Strodthoff, Nils, Alcaraz, Juan Miguel Lopez

arXiv.org Artificial IntelligenceSep-23-2025

Purpose: Chest X-rays are essential for diagnosing pulmonary conditions, but limited access in resource-constrained settings can delay timely diagnosis. Electrocardiograms (ECGs), in contrast, are widely available, non-invasive, and often acquired earlier in clinical workflows. This study aims to assess whether ECG features and patient demographics can predict chest radiograph findings using an interpretable machine learning approach. Methods: Using the MIMIC-IV database, Extreme Gradient Boosting (XGBoost) classifiers were trained to predict diverse chest radiograph findings from ECG-derived features and demographic variables. Recursive feature elimination was performed independently for each target to identify the most predictive features. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC) with bootstrapped 95% confidence intervals. Shapley Additive Explanations (SHAP) were applied to interpret feature contributions. Results: Models successfully predicted multiple chest radiograph findings with varying accuracy. Feature selection tailored predictors to each target, and including demographic variables consistently improved performance. SHAP analysis revealed clinically meaningful contributions from ECG features to radiographic predictions. Conclusion: ECG-derived features combined with patient demographics can serve as a proxy for certain chest radiograph findings, enabling early triage or pre-screening in settings where radiographic imaging is limited. Interpretable machine learning demonstrates potential to support radiology workflows and improve patient care.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

2509.17674

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.88)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

A Consequentialist Critique of Binary Classification Evaluation Practices

Flores, Gerardo, Schiff, Abigail, Smith, Alyssa H., Fukuyama, Julia A, Wilson, Ashia C.

arXiv.org Machine LearningApr-6-2025

ML-supported decisions, such as ordering tests or determining preventive custody, often involve binary classification based on probabilistic forecasts. Evaluation frameworks for such forecasts typically consider whether to prioritize independent-decision metrics (e.g., Accuracy) or top-K metrics (e.g., Precision@K), and whether to focus on fixed thresholds or threshold-agnostic measures like AUC-ROC. We highlight that a consequentialist perspective, long advocated by decision theorists, should naturally favor evaluations that support independent decisions using a mixture of thresholds given their prevalence, such as Brier scores and Log loss. However, our empirical analysis reveals a strong preference for top-K metrics or fixed thresholds in evaluations at major conferences like ICML, FAccT, and CHIL. To address this gap, we use this decision-theoretic framework to map evaluation metrics to their optimal use cases, along with a Python package, briertools, to promote the broader adoption of Brier scores. In doing so, we also uncover new theoretical connections, including a reconciliation between the Brier Score and Decision Curve Analysis, which clarifies and responds to a longstanding critique by (Assel, et al. 2017) regarding the clinical utility of proper scoring rules.

brier score, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2504.04528

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Michigan (0.04)
Asia > Middle East > Jordan (0.04)
(4 more...)

Genre: Research Report > New Finding (0.92)

Industry:

Law (0.93)
Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

A Foundational Framework and Methodology for Personalized Early and Timely Diagnosis

Schubert, Tim, Peck, Richard W, Gimson, Alexander, Davtyan, Camelia, van der Schaar, Mihaela

arXiv.org Artificial IntelligenceNov-26-2023

Early diagnosis of diseases holds the potential for deep transformation in healthcare by enabling better treatment options, improving long-term survival and quality of life, and reducing overall cost. With the advent of medical big data, advances in diagnostic tests as well as in machine learning and statistics, early or timely diagnosis seems within reach. Early diagnosis research often neglects the potential for optimizing individual diagnostic paths. To enable personalized early diagnosis, a foundational framework is needed that delineates the diagnosis process and systematically identifies the time-dependent value of various diagnostic tests for an individual patient given their unique characteristics. Here, we propose the first foundational framework for early and timely diagnosis. It builds on decision-theoretic approaches to outline the diagnosis process and integrates machine learning and statistical methodology for estimating the optimal personalized diagnostic path. To describe the proposed framework as well as possibly other frameworks, we provide essential definitions. The development of a foundational framework is necessary for several reasons: 1) formalism provides clarity for the development of decision support tools; 2) observed information can be complemented with estimates of the future patient trajectory; 3) the net benefit of counterfactual diagnostic paths and associated uncertainties can be modeled for individuals 4) 'early' and 'timely' diagnosis can be clearly defined; 5) a mechanism emerges for assessing the value of technologies in terms of their impact on personalized early diagnosis, resulting health outcomes and incurred costs. Finally, we hope that this foundational framework will unlock the long-awaited potential of timely diagnosis and intervention, leading to improved outcomes for patients and higher cost-effectiveness for healthcare systems.

diagnosis, diagnostic path, information, (17 more...)

arXiv.org Artificial Intelligence

2311.16195

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(3 more...)

Technology:

Information Technology > Decision Support Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Explainable machine learning-based prediction model for diabetic nephropathy

Yin, Jing-Mei, Li, Yang, Xue, Jun-Tang, Zong, Guo-Wei, Fang, Zhong-Ze, Zou, Lang

arXiv.org Artificial IntelligenceOct-24-2023

The aim of this study is to analyze the effect of serum metabolites on diabetic nephropathy (DN) and predict the prevalence of DN through a machine learning approach. The dataset consists of 548 patients from April 2018 to April 2019 in Second Affiliated Hospital of Dalian Medical University (SAHDMU). We select the optimal 38 features through a Least absolute shrinkage and selection operator (LASSO) regression model and a 10-fold cross-validation. We compare four machine learning algorithms, including eXtreme Gradient Boosting (XGB), random forest, decision tree and logistic regression, by AUC-ROC curves, decision curves, calibration curves. We quantify feature importance and interaction effects in the optimal predictive model by Shapley Additive exPlanations (SHAP) method. The XGB model has the best performance to screen for DN with the highest AUC value of 0.966. The XGB model also gains more clinical net benefits than others and the fitting degree is better. In addition, there are significant interactions between serum metabolites and duration of diabetes. We develop a predictive model by XGB algorithm to screen for DN. C2, C5DC, Tyr, Ser, Met, C24, C4DC, and Cys have great contribution in the model, and can possibly be biomarkers for DN.

diabetic nephropathy, serum metabolite, xgb model, (12 more...)

arXiv.org Artificial Intelligence

2309.1673

Country:

Asia > China > Liaoning Province > Dalian (0.25)
Asia > China > Tianjin Province > Tianjin (0.05)
Asia > China > Hunan Province (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Nephrology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Tensor-based Multimodal Learning for Prediction of Pulmonary Arterial Wedge Pressure from Cardiac MRI

Tripathi, Prasun C., Suvon, Mohammod N. I., Schobs, Lawrence, Zhou, Shuo, Alabed, Samer, Swift, Andrew J., Lu, Haiping

arXiv.org Artificial IntelligenceMar-13-2023

Heart failure is a serious and life-threatening condition that can lead to elevated pressure in the left ventricle. Pulmonary Arterial Wedge Pressure (PAWP) is an important surrogate marker indicating high pressure in the left ventricle. PAWP is determined by Right Heart Catheterization (RHC) but it is an invasive procedure. A non-invasive method is useful in quickly identifying high-risk patients from a large population. In this work, we develop a tensor learning-based pipeline for identifying PAWP from multimodal cardiac Magnetic Resonance Imaging (MRI). This pipeline extracts spatial and temporal features from high-dimensional scans. For quality control, we incorporate an epistemic uncertainty-based binning strategy to identify poor-quality training samples. To improve the performance, we learn complementary information by integrating features from multimodal data: cardiac MRI with short-axis and four-chamber views, and Electronic Health Records. The experimental analysis on a large cohort of $1346$ subjects who underwent the RHC procedure for PAWP estimation indicates that the proposed pipeline has a diagnostic value and can produce promising performance with significant improvement over the baseline in clinical practice (i.e., $\Delta$AUC $=0.10$, $\Delta$Accuracy $=0.06$, and $\Delta$MCC $=0.39$). The decision curve analysis further confirms the clinical utility of our method.

artificial intelligence, fusion, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2303.0754

Country:

Europe > United Kingdom > England > South Yorkshire > Sheffield (0.05)
South America > Peru > Lima Department > Lima Province > Lima (0.04)
North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)

Genre: Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Net benefit, calibration, threshold selection, and training objectives for algorithmic fairness in healthcare

Pfohl, Stephen R., Xu, Yizhe, Foryciarz, Agata, Ignatiadis, Nikolaos, Genkins, Julian, Shah, Nigam H.

arXiv.org Machine LearningFeb-3-2022

A growing body of work uses the paradigm of algorithmic fairness to frame the development of techniques to anticipate and proactively mitigate the introduction or exacerbation of health inequities that may follow from the use of model-guided decision-making. We evaluate the interplay between measures of model performance, fairness, and the expected utility of decision-making to offer practical recommendations for the operationalization of algorithmic fairness principles for the development and evaluation of predictive models in healthcare. We conduct an empirical case-study via development of models to estimate the ten-year risk of atherosclerotic cardiovascular disease to inform statin initiation in accordance with clinical practice guidelines. We demonstrate that approaches that incorporate fairness considerations into the model training objective typically do not improve model performance or confer greater net benefit for any of the studied patient populations compared to the use of standard learning paradigms followed by threshold selection concordant with patient preferences, evidence of intervention effectiveness, and model calibration. These results hold when the measured outcomes are not subject to differential measurement error across patient populations and threshold selection is unconstrained, regardless of whether differences in model performance metrics, such as in true and false positive error rates, are present. In closing, we argue for focusing model development efforts on developing calibrated models that predict outcomes well for all patient populations while emphasizing that such efforts are complementary to transparent reporting, participatory design, and reasoning about the impact of model-informed interventions in context.

net benefit, subgroup, threshold, (13 more...)

arXiv.org Machine Learning

2202.01906

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > Greenland (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Medicine Under the Magnifying Glass – Towards Data Science

#artificialintelligenceNov-27-2018, 10:19:26 GMT

In Part 1, I'll introduce the problem of bad medicine. As you review the evidence for it, you'll also get a good sense of why it's so entrenched. In Part 2, I'll examine the implications of the problem on artificial intelligence. In his history of Bad Medicine, David Wootton argued, "We can only think about medical progress if we start with the long tradition of medical failure."

artificial intelligence, medical practice, medicine, (16 more...)

#artificialintelligence

Country:

North America > United States (0.47)
North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report > Experimental Study (0.70)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.70)

Technology: Information Technology > Artificial Intelligence (0.86)

Add feedback

Cost-Optimal and Net-Benefit Planning — A Parameterised Complexity View

Aghighi, Meysam (Linköping University) | Bäckström, Christer (Linköping University)

AAAI ConferencesJul-15-2015

Cost-optimal planning (COP) uses action costs and asks for a minimum-cost plan. It is sometimes assumed that there is no harm in using actions with zero cost or rational cost. Classical complexity analysis does not contradict this assumption; planning is PSPACE-complete regardless of whether action costs are positive or non-negative, integer or rational. We thus apply parameterised complexity analysis to shed more light on this issue. Our main results are the following. COP is [W2]-complete for positive integer costs, i.e. it is no harder than finding a minimum-length plan, but it is paraNP-hard if the costs are non-negative integers or positive rationals. This is a very strong indication that the latter cases are substantially harder. Net-benefit planning (NBP) additionally assigns goal utilities and asks for a plan with maximum difference between its utility and its cost. NBP is paraNP-hard even when action costs and utilities are positive integers, suggesting that it is harder than COP. In addition, we also analyse a large number of subclasses, using both the PUBS restrictions and restricting the number of preconditions and effects.

action cost, artificial intelligence, planning & scheduling, (17 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > Oklahoma > Payne County > Cushing (0.05)
Europe > Sweden > Östergötland County > Linköping (0.04)
Asia > China > Beijing > Beijing (0.04)
(17 more...)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)

Add feedback

Oversubscription Planning: Complexity and Compilability

Aghighi, Meysam (Linköping University) | Jonsson, Peter (Linköping University)

AAAI ConferencesJul-14-2014

Many real-world planning problems are oversubscription problems where all goals are not simultaneously achievable and the planner needs to find a feasible subset. We present complexity results for the so-called partial satisfaction and net benefit problems under various restrictions; this extends previous work by van den Briel et al. Our results reveal strong connections between these problems and with classical planning. We also present a method for efficiently compiling oversubscription problems into the ordinary plan existence problem; this can be viewed as a continuation of earlier work by Keyder & Geffner.

artificial intelligence, oversubscription planning, planning & scheduling, (15 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

Europe > Sweden (0.14)
North America > United States (0.14)

Genre: Research Report (0.48)

Industry: Energy > Oil & Gas (0.68)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Add feedback