study population
Accounting for Missing Covariates in Heterogeneous Treatment Estimation
Yamin, Khurram, Sharma, Vibhhu, Kennedy, Ed, Wilder, Bryan
For example, if the initial study was an RCT, it may have failed to measure practically important Many applications of causal inference require covariates [Kahan et al., 2014] such as social using treatment effects estimated on a study determinants of health [Huang et al., 2024]. Since the population to make decisions in a separate intervention has not previously been used by the health target population. We consider the challenging system, no outcome data linked to these new covariates setting where there are covariates that are is available. However, treatment decisions would observed in the target population that were ideally reflect whether the intervention is likely to be not seen in the original study. Our goal is to beneficial to a patient conditional on all information estimate the tightest possible bounds on heterogeneous available, not just covariates that happened to be in the treatment effects conditioned on original study. This paper studies the question: how such newly observed covariates. We introduce precisely can we identify treatment effects conditional a novel partial identification strategy based on such new covariates? If precise estimates are available, on ideas from ecological inference; the main the decision maker can proceed confidently with idea is that estimates of conditional treatment deployment. Conversely, if considerable uncertainty remains effects for the full covariate set must about an important subgroup, a decision maker marginalize correctly when restricted to only may exercise more caution or invest more resources in the covariates observed in both populations.
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > Strength High (0.93)
Sample Selection Bias in Machine Learning for Healthcare
Chauhan, Vinod Kumar, Clifton, Lei, Salaün, Achille, Lu, Huiqi Yvonne, Branson, Kim, Schwab, Patrick, Nigam, Gaurav, Clifton, David A.
While machine learning algorithms hold promise for personalised medicine, their clinical adoption remains limited. One critical factor contributing to this restraint is sample selection bias (SSB) which refers to the study population being less representative of the target population, leading to biased and potentially harmful decisions. Despite being well-known in the literature, SSB remains scarcely studied in machine learning for healthcare. Moreover, the existing techniques try to correct the bias by balancing distributions between the study and the target populations, which may result in a loss of predictive performance. To address these problems, our study illustrates the potential risks associated with SSB by examining SSB's impact on the performance of machine learning algorithms. Most importantly, we propose a new research direction for addressing SSB, based on the target population identification rather than the bias correction. Specifically, we propose two independent networks (T-Net) and a multitasking network (MT-Net) for addressing SSB, where one network/task identifies the target subpopulation which is representative of the study population and the second makes predictions for the identified subpopulation. Our empirical results with synthetic and semi-synthetic datasets highlight that SSB can lead to a large drop in the performance of an algorithm for the target population as compared with the study population, as well as a substantial difference in the performance for the target subpopulations that are representative of the selected and the non-selected patients from the study population. Furthermore, our proposed techniques demonstrate robustness across various settings, including different dataset sizes, event rates, and selection rates, outperforming the existing bias correction techniques.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)
- Asia > China > Hong Kong (0.04)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- (5 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Epidemiology (0.95)
- Health & Medicine > Health Care Technology (0.67)
Who Are We Missing? A Principled Approach to Characterizing the Underrepresented Population
Parikh, Harsh, Ross, Rachael, Stuart, Elizabeth, Rudolph, Kara
Randomized controlled trials (RCTs) serve as the cornerstone for understanding causal effects, yet extending inferences to target populations presents challenges due to effect heterogeneity and underrepresentation. Our paper addresses the critical issue of identifying and characterizing underrepresented subgroups in RCTs, proposing a novel framework for refining target populations to improve generalizability. We introduce an optimization-based approach, Rashomon Set of Optimal Trees (ROOT), to characterize underrepresented groups. ROOT optimizes the target subpopulation distribution by minimizing the variance of the target average treatment effect estimate, ensuring more precise treatment effect estimations. Notably, ROOT generates interpretable characteristics of the underrepresented population, aiding researchers in effective communication. Our approach demonstrates improved precision and interpretability compared to alternatives, as illustrated with synthetic data experiments. We apply our methodology to extend inferences from the Starting Treatment with Agonist Replacement Therapies (START) trial -- investigating the effectiveness of medication for opioid use disorder -- to the real-world population represented by the Treatment Episode Dataset: Admissions (TEDS-A). By refining target populations using ROOT, our framework offers a systematic approach to enhance decision-making accuracy and inform future trials in diverse populations.
- North America > United States > South Carolina (0.04)
- North America > United States > Oregon (0.04)
- North America > United States > District of Columbia (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
A Causal Inference Framework for Leveraging External Controls in Hybrid Trials
Valancius, Michael, Pang, Herb, Zhu, Jiawen, Cole, Stephen R, Funk, Michele Jonsson, Kosorok, Michael R
We consider the challenges associated with causal inference in settings where data from a randomized trial is augmented with control data from an external source to improve efficiency in estimating the average treatment effect (ATE). Through the development of a formal causal inference framework, we outline sufficient causal assumptions about the exchangeability between the internal and external controls to identify the ATE and establish the connection to a novel graphical criteria. We propose estimators, review efficiency bounds, develop an approach for efficient doubly-robust estimation even when unknown nuisance models are estimated with flexible machine learning methods, and demonstrate finite-sample performance through a simulation study. To illustrate the ideas and methods, we apply the framework to a trial investigating the effect of risdisplam on motor function in patients with spinal muscular atrophy for which there exists an external set of control patients from a previous trial.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- (4 more...)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Researchers build first AI tool capable of identifying individual birds
New research demonstrates for the first time that artificial intelligence (AI) can be used to train computers to recognize individual birds, a task humans are unable to do. The research is published in the British Ecological Society journal Methods in Ecology and Evolution. "We show that computers can consistently recognize dozens of individual birds, even though we cannot ourselves tell these individuals apart. In doing so, our study provides the means of overcoming one of the greatest limitations in the study of wild birds--reliably recognizing individuals." Said Dr. André Ferreira at the Center for Functional and Evolutionary Ecology (CEFE), France, and lead author of the study.
- Research Report > New Finding (0.73)
- Research Report > Experimental Study (0.73)
Machine Learning Prediction of Mortality and Hospitalization in Heart Failure with Preserved Ejection Fraction
Objectives This study sought to develop models for predicting mortality and heart failure (HF) hospitalization for outpatients with HF with preserved ejection fraction (HFpEF) in the TOPCAT (Treatment of Preserved Cardiac Function Heart Failure with an Aldosterone Antagonist) trial. Background Although risk assessment models are available for patients with HF with reduced ejection fraction, few have assessed the risks of death and hospitalization in patients with HFpEF. Methods The following 5 methods: logistic regression with a forward selection of variables; logistic regression with a lasso regularization for variable selection; random forest (RF); gradient descent boosting; and support vector machine, were used to train models for assessing risks of mortality and HF hospitalization through 3 years of follow-up and were validated using 5-fold cross-validation. Model discrimination and calibration were estimated using receiver-operating characteristic curves and Brier scores, respectively. The top prediction variables were assessed by using the best performing models, using the incremental improvement of each variable in 5-fold cross-validation. Results The RF was the best performing model with a mean C-statistic of 0.72 (95% confidence interval [CI]: 0.69 to 0.75) for predicting mortality (Brier score: 0.17), and 0.76 (95% CI: 0.71 to 0.81) for HF hospitalization (Brier score: 0.19). Blood urea nitrogen levels, body mass index, and Kansas City Cardiomyopathy Questionnaire (KCCQ) subscale scores were strongly associated with mortality, whereas hemoglobin level, blood urea nitrogen, time since previous HF hospitalization, and KCCQ scores were the most significant predictors of HF hospitalization. Conclusions These models predict the risks of mortality and HF hospitalization in patients with HFpEF and emphasize the importance of health status data in determining prognosis.
- North America > United States > Missouri > Jackson County > Kansas City (0.25)
- Europe > Austria > Vienna (0.14)
- South America > Brazil (0.04)
- (7 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Making Study Populations Visible through Knowledge Graphs
Chari, Shruthi, Qi, Miao, Agu, Nkcheniyere N., Seneviratne, Oshani, McCusker, James P., Bennett, Kristin P., Das, Amar K., McGuinness, Deborah L.
Treatment recommendations within Clinical Practice Guidelines (CPGs) are largely based on findings from clinical trials and case studies, referred to here as research studies, that are often based on highly selective clinical populations, referred to here as study cohorts. When medical practitioners apply CPG recommendations, they need to understand how well their patient population matches the characteristics of those in the study cohort, and thus are confronted with the challenges of locating the study cohort information and making an analytic comparison. To address these challenges, we develop an ontology-enabled prototype system, which exposes the population descriptions in research studies in a declarative manner, with the ultimate goal of allowing medical practitioners to better understand the applicability and generalizability of treatment recommendations. We build a Study Cohort Ontology (SCO) to encode the vocabulary of study population descriptions, that are often reported in the first table in the published work, thus they are often referred to as Table 1. We leverage the well-used Semanticscience Integrated Ontology (SIO) for defining property associations between classes. Further, we model the key components of Table 1s, i.e., collections of study subjects, subject characteristics, and statistical measures in RDF knowledge graphs. We design scenarios for medical practitioners to perform population analysis, and generate cohort similarity visualizations to determine the applicability of a study population to the clinical population of interest. Our semantic approach to make study populations visible, by standardized representations of Table 1s, allows users to quickly derive clinically relevant inferences about study populations.
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > North Carolina > Wake County > Raleigh (0.04)
- (6 more...)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.90)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.70)
- Health & Medicine > Therapeutic Area > Neurology (0.46)
A Computational Model of Reasoning from the Clinical Literature
Rennels, Glenn D., Shortliffe, Edward H., Stockdale, Frank E., Miller, Perry L.
This article explores the premise that a formalized representation of empirical studies can play a central role in computer- based decision support. The specific motivations underlying this research include the following propositions: (1) Reasoning from experimental evidence contained in the clinical literature is central to the decisions physicians make in patient care. (2) A computational model based on a declarative representation for published reports of clinical studies can drive a computer program that selectively tailors knowledge of the clinical literature as it is applied to a particular case. (3) The development of such a computational model is an important first step toward filling a void in computer-based decision support systems. Furthermore, the model can help us better understand the general principles of reasoning from experimental evidence both in medicine and other domains. Roundsman is a developmental computer system that draws on structured representations of the clinical literature to critique plans for the management of primary breast cancer. Roundsman is able to produce patient-specific analyses of breast cancer-management options based on the 24 clinical studies currently encoded in its knowledge base. The Roundsman system is a first step in exploring how the computer can help bring a critical analysis of the relevant literature, structured around a particular patient and treatment decision, to the physician.
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > Connecticut > Fairfield County > Stamford (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.89)