Goto

Collaborating Authors

 study population


Accounting for Missing Covariates in Heterogeneous Treatment Estimation

Yamin, Khurram, Sharma, Vibhhu, Kennedy, Ed, Wilder, Bryan

arXiv.org Artificial Intelligence

For example, if the initial study was an RCT, it may have failed to measure practically important Many applications of causal inference require covariates [Kahan et al., 2014] such as social using treatment effects estimated on a study determinants of health [Huang et al., 2024]. Since the population to make decisions in a separate intervention has not previously been used by the health target population. We consider the challenging system, no outcome data linked to these new covariates setting where there are covariates that are is available. However, treatment decisions would observed in the target population that were ideally reflect whether the intervention is likely to be not seen in the original study. Our goal is to beneficial to a patient conditional on all information estimate the tightest possible bounds on heterogeneous available, not just covariates that happened to be in the treatment effects conditioned on original study. This paper studies the question: how such newly observed covariates. We introduce precisely can we identify treatment effects conditional a novel partial identification strategy based on such new covariates? If precise estimates are available, on ideas from ecological inference; the main the decision maker can proceed confidently with idea is that estimates of conditional treatment deployment. Conversely, if considerable uncertainty remains effects for the full covariate set must about an important subgroup, a decision maker marginalize correctly when restricted to only may exercise more caution or invest more resources in the covariates observed in both populations.


Sample Selection Bias in Machine Learning for Healthcare

Chauhan, Vinod Kumar, Clifton, Lei, Salaün, Achille, Lu, Huiqi Yvonne, Branson, Kim, Schwab, Patrick, Nigam, Gaurav, Clifton, David A.

arXiv.org Artificial Intelligence

While machine learning algorithms hold promise for personalised medicine, their clinical adoption remains limited. One critical factor contributing to this restraint is sample selection bias (SSB) which refers to the study population being less representative of the target population, leading to biased and potentially harmful decisions. Despite being well-known in the literature, SSB remains scarcely studied in machine learning for healthcare. Moreover, the existing techniques try to correct the bias by balancing distributions between the study and the target populations, which may result in a loss of predictive performance. To address these problems, our study illustrates the potential risks associated with SSB by examining SSB's impact on the performance of machine learning algorithms. Most importantly, we propose a new research direction for addressing SSB, based on the target population identification rather than the bias correction. Specifically, we propose two independent networks (T-Net) and a multitasking network (MT-Net) for addressing SSB, where one network/task identifies the target subpopulation which is representative of the study population and the second makes predictions for the identified subpopulation. Our empirical results with synthetic and semi-synthetic datasets highlight that SSB can lead to a large drop in the performance of an algorithm for the target population as compared with the study population, as well as a substantial difference in the performance for the target subpopulations that are representative of the selected and the non-selected patients from the study population. Furthermore, our proposed techniques demonstrate robustness across various settings, including different dataset sizes, event rates, and selection rates, outperforming the existing bias correction techniques.


Who Are We Missing? A Principled Approach to Characterizing the Underrepresented Population

Parikh, Harsh, Ross, Rachael, Stuart, Elizabeth, Rudolph, Kara

arXiv.org Artificial Intelligence

Randomized controlled trials (RCTs) serve as the cornerstone for understanding causal effects, yet extending inferences to target populations presents challenges due to effect heterogeneity and underrepresentation. Our paper addresses the critical issue of identifying and characterizing underrepresented subgroups in RCTs, proposing a novel framework for refining target populations to improve generalizability. We introduce an optimization-based approach, Rashomon Set of Optimal Trees (ROOT), to characterize underrepresented groups. ROOT optimizes the target subpopulation distribution by minimizing the variance of the target average treatment effect estimate, ensuring more precise treatment effect estimations. Notably, ROOT generates interpretable characteristics of the underrepresented population, aiding researchers in effective communication. Our approach demonstrates improved precision and interpretability compared to alternatives, as illustrated with synthetic data experiments. We apply our methodology to extend inferences from the Starting Treatment with Agonist Replacement Therapies (START) trial -- investigating the effectiveness of medication for opioid use disorder -- to the real-world population represented by the Treatment Episode Dataset: Admissions (TEDS-A). By refining target populations using ROOT, our framework offers a systematic approach to enhance decision-making accuracy and inform future trials in diverse populations.


A Causal Inference Framework for Leveraging External Controls in Hybrid Trials

Valancius, Michael, Pang, Herb, Zhu, Jiawen, Cole, Stephen R, Funk, Michele Jonsson, Kosorok, Michael R

arXiv.org Machine Learning

We consider the challenges associated with causal inference in settings where data from a randomized trial is augmented with control data from an external source to improve efficiency in estimating the average treatment effect (ATE). Through the development of a formal causal inference framework, we outline sufficient causal assumptions about the exchangeability between the internal and external controls to identify the ATE and establish the connection to a novel graphical criteria. We propose estimators, review efficiency bounds, develop an approach for efficient doubly-robust estimation even when unknown nuisance models are estimated with flexible machine learning methods, and demonstrate finite-sample performance through a simulation study. To illustrate the ideas and methods, we apply the framework to a trial investigating the effect of risdisplam on motor function in patients with spinal muscular atrophy for which there exists an external set of control patients from a previous trial.


Researchers build first AI tool capable of identifying individual birds

#artificialintelligence

New research demonstrates for the first time that artificial intelligence (AI) can be used to train computers to recognize individual birds, a task humans are unable to do. The research is published in the British Ecological Society journal Methods in Ecology and Evolution. "We show that computers can consistently recognize dozens of individual birds, even though we cannot ourselves tell these individuals apart. In doing so, our study provides the means of overcoming one of the greatest limitations in the study of wild birds--reliably recognizing individuals." Said Dr. André Ferreira at the Center for Functional and Evolutionary Ecology (CEFE), France, and lead author of the study.


Machine Learning Prediction of Mortality and Hospitalization in Heart Failure with Preserved Ejection Fraction

#artificialintelligence

Objectives This study sought to develop models for predicting mortality and heart failure (HF) hospitalization for outpatients with HF with preserved ejection fraction (HFpEF) in the TOPCAT (Treatment of Preserved Cardiac Function Heart Failure with an Aldosterone Antagonist) trial. Background Although risk assessment models are available for patients with HF with reduced ejection fraction, few have assessed the risks of death and hospitalization in patients with HFpEF. Methods The following 5 methods: logistic regression with a forward selection of variables; logistic regression with a lasso regularization for variable selection; random forest (RF); gradient descent boosting; and support vector machine, were used to train models for assessing risks of mortality and HF hospitalization through 3 years of follow-up and were validated using 5-fold cross-validation. Model discrimination and calibration were estimated using receiver-operating characteristic curves and Brier scores, respectively. The top prediction variables were assessed by using the best performing models, using the incremental improvement of each variable in 5-fold cross-validation. Results The RF was the best performing model with a mean C-statistic of 0.72 (95% confidence interval [CI]: 0.69 to 0.75) for predicting mortality (Brier score: 0.17), and 0.76 (95% CI: 0.71 to 0.81) for HF hospitalization (Brier score: 0.19). Blood urea nitrogen levels, body mass index, and Kansas City Cardiomyopathy Questionnaire (KCCQ) subscale scores were strongly associated with mortality, whereas hemoglobin level, blood urea nitrogen, time since previous HF hospitalization, and KCCQ scores were the most significant predictors of HF hospitalization. Conclusions These models predict the risks of mortality and HF hospitalization in patients with HFpEF and emphasize the importance of health status data in determining prognosis.


Making Study Populations Visible through Knowledge Graphs

Chari, Shruthi, Qi, Miao, Agu, Nkcheniyere N., Seneviratne, Oshani, McCusker, James P., Bennett, Kristin P., Das, Amar K., McGuinness, Deborah L.

arXiv.org Machine Learning

Treatment recommendations within Clinical Practice Guidelines (CPGs) are largely based on findings from clinical trials and case studies, referred to here as research studies, that are often based on highly selective clinical populations, referred to here as study cohorts. When medical practitioners apply CPG recommendations, they need to understand how well their patient population matches the characteristics of those in the study cohort, and thus are confronted with the challenges of locating the study cohort information and making an analytic comparison. To address these challenges, we develop an ontology-enabled prototype system, which exposes the population descriptions in research studies in a declarative manner, with the ultimate goal of allowing medical practitioners to better understand the applicability and generalizability of treatment recommendations. We build a Study Cohort Ontology (SCO) to encode the vocabulary of study population descriptions, that are often reported in the first table in the published work, thus they are often referred to as Table 1. We leverage the well-used Semanticscience Integrated Ontology (SIO) for defining property associations between classes. Further, we model the key components of Table 1s, i.e., collections of study subjects, subject characteristics, and statistical measures in RDF knowledge graphs. We design scenarios for medical practitioners to perform population analysis, and generate cohort similarity visualizations to determine the applicability of a study population to the clinical population of interest. Our semantic approach to make study populations visible, by standardized representations of Table 1s, allows users to quickly derive clinically relevant inferences about study populations.


A Computational Model of Reasoning from the Clinical Literature

Rennels, Glenn D., Shortliffe, Edward H., Stockdale, Frank E., Miller, Perry L.

AI Magazine

This article explores the premise that a formalized representation of empirical studies can play a central role in computer- based decision support. The specific motivations underlying this research include the following propositions: (1) Reasoning from experimental evidence contained in the clinical literature is central to the decisions physicians make in patient care. (2) A computational model based on a declarative representation for published reports of clinical studies can drive a computer program that selectively tailors knowledge of the clinical literature as it is applied to a particular case. (3) The development of such a computational model is an important first step toward filling a void in computer-based decision support systems. Furthermore, the model can help us better understand the general principles of reasoning from experimental evidence both in medicine and other domains. Roundsman is a developmental computer system that draws on structured representations of the clinical literature to critique plans for the management of primary breast cancer. Roundsman is able to produce patient-specific analyses of breast cancer-management options based on the 24 clinical studies currently encoded in its knowledge base. The Roundsman system is a first step in exploring how the computer can help bring a critical analysis of the relevant literature, structured around a particular patient and treatment decision, to the physician.