Collaborating Authors


Deep Bayesian Estimation for Dynamic Treatment Regimes with a Long Follow-up Time Artificial Intelligence

Causal effect estimation for dynamic treatment regimes (DTRs) contributes to sequential decision making. However, censoring and time-dependent confounding under DTRs are challenging as the amount of observational data declines over time due to a reducing sample size but the feature dimension increases over time. Long-term follow-up compounds these challenges. Another challenge is the highly complex relationships between confounders, treatments, and outcomes, which causes the traditional and commonly used linear methods to fail. We combine outcome regression models with treatment models for high dimensional features using uncensored subjects that are small in sample size and we fit deep Bayesian models for outcome regression models to reveal the complex relationships between confounders, treatments, and outcomes. Also, the developed deep Bayesian models can model uncertainty and output the prediction variance which is essential for the safety-aware applications, such as self-driving cars and medical treatment design. The experimental results on medical simulations of HIV treatment show the ability of the proposed method to obtain stable and accurate dynamic causal effect estimation from observational data, especially with long-term follow-up. Our technique provides practical guidance for sequential decision making, and policy-making.

Google's Head of AI Talks About the Future of the EHR


This transcript has been edited for clarity. This is Eric Topol with Medicine and the Machine, with my co-host, Abraham Verghese. This is a special edition for us, to speak with one of the leading lights of artificial intelligence (AI) in the world, Jeff Dean, who heads up Google AI. Jeff Dean, PhD: Thank you for having me. Topol: You have now been at Google for 22 years. In a recent book by Cade Metz (a New York Times tech journalist) called Genius Makers, you are one of the protagonists. I didn't know this about you, but you grew up across the globe. Your parents took you from Hawaii, where you were born, to Somalia, where you helped run a refugee camp during your middle school years. As a high school senior in Georgia where your father worked at the CDC, you built a software tool for them that helped researchers collect disease data, and nearly four decades later it remains a staple of epidemiology across the developing world.

Two-Stage TMLE to Reduce Bias and Improve Efficiency in Cluster Randomized Trials Machine Learning

Cluster randomized trials (CRTs) randomly assign an intervention to groups of individuals (e.g., clinics or communities), and measure outcomes on individuals in those groups. While offering many advantages, this experimental design introduces challenges that are only partially addressed by existing analytic approaches. First, outcomes are often missing for some individuals within clusters. Failing to appropriately adjust for differential outcome measurement can result in biased estimates and inference. Second, CRTs often randomize limited numbers of clusters, resulting in chance imbalances on baseline outcome predictors between arms. Failing to adaptively adjust for these imbalances and other predictive covariates can result in efficiency losses. To address these methodological gaps, we propose and evaluate a novel two-stage targeted minimum loss-based estimator (TMLE) to adjust for baseline covariates in a manner that optimizes precision, after controlling for baseline and post-baseline causes of missing outcomes. Finite sample simulations illustrate that our approach can nearly eliminate bias due to differential outcome measurement, while other common CRT estimators yield misleading results and inferences. Application to real data from the SEARCH community randomized trial demonstrates the gains in efficiency afforded through adaptive adjustment for cluster-level covariates, after controlling for missingness on individual-level outcomes.

Mandates as We Near Herd Immunity? AI and Machine Learning Have Answers


In spite of the debates and partisan politics that we can't seem to avoid no matter where we turn, everybody in the United States and the world genuinely wants the same thing: to return to our normal lives and avoid individual and global-scale financial crises without contracting or spreading COVID-19. But until a herd immunity is reached, which seems unlikely with the rate of current inoculation, we are faced with a seemingly unsolvable challenge in knowing exactly how to keep the recent variants from spreading while not hurting communities by shutting down schools, businesses, and cities unnecessarily. Who can we listen to? How do we know what is ok, and what types of activities are ok and what should be avoided? Countries such as England, Germany, Ireland, Israel, Italy, Belgium and Lebanon are extending national lockdowns.

AI-Powered Drug Development in a Post-COVID World


The developed world is on the cusp of turning the corner in the fight against COVID-19 thanks to the unprecedented effort to rapidly develop and distribute effective vaccines. Now technologists are hoping to take drug development to the next level, and AI will play a big role. One of the companies at the forefront of using machine learning and AI to develop drugs is CytoReason. The company helps pharmaceutical firms like Pfizer accelerate drug development by providing high resolution models of the human body that's infected with the disease that the drug companies are targeting. "If I told you that in 200 years, drugs would be developed in a computer, you would not be real surprised," said CytoReason CEO and founder David Harel.

Pyfectious: An individual-level simulator to discover optimal containment polices for epidemic diseases Artificial Intelligence

Simulating the spread of infectious diseases in human communities is critical for predicting the trajectory of an epidemic and verifying various policies to control the devastating impacts of the outbreak. Many existing simulators are based on compartment models that divide people into a few subsets and simulate the dynamics among those subsets using hypothesized differential equations. However, these models lack the requisite granularity to study the effect of intelligent policies that influence every individual in a particular way. In this work, we introduce a simulator software capable of modeling a population structure and controlling the disease's propagation at an individualistic level. In order to estimate the confidence of the conclusions drawn from the simulator, we employ a comprehensive probabilistic approach where the entire population is constructed as a hierarchical random variable. This approach makes the inferred conclusions more robust against sampling artifacts and gives confidence bounds for decisions based on the simulation results. To showcase potential applications, the simulator parameters are set based on the formal statistics of the COVID-19 pandemic, and the outcome of a wide range of control measures is investigated. Furthermore, the simulator is used as the environment of a reinforcement learning problem to find the optimal policies to control the pandemic. The obtained experimental results indicate the simulator's adaptability and capacity in making sound predictions and a successful policy derivation example based on real-world data. As an exemplary application, our results show that the proposed policy discovery method can lead to control measures that produce significantly fewer infected individuals in the population and protect the health system against saturation.

NCoRE: Neural Counterfactual Representation Learning for Combinations of Treatments Machine Learning

Estimating an individual's potential response to interventions from observational data is of high practical relevance for many domains, such as healthcare, public policy or economics. In this setting, it is often the case that combinations of interventions may be applied simultaneously, for example, multiple prescriptions in healthcare or different fiscal and monetary measures in economics. However, existing methods for counterfactual inference are limited to settings in which actions are not used simultaneously. Here, we present Neural Counterfactual Relation Estimation (NCoRE), a new method for learning counterfactual representations in the combination treatment setting that explicitly models cross-treatment interactions. NCoRE is based on a novel branched conditional neural representation that includes learnt treatment interaction modulators to infer the potential causal generative process underlying the combination of multiple treatments. Our experiments show that NCoRE significantly outperforms existing state-of-the-art methods for counterfactual treatment effect estimation that do not account for the effects of combining multiple treatments across several synthetic, semi-synthetic and real-world benchmarks.

Evidence-Based Policy Learning Machine Learning

The past years have seen seen the development and deployment of machine-learning algorithms to estimate personalized treatment-assignment policies from randomized controlled trials. Yet such algorithms for the assignment of treatment typically optimize expected outcomes without taking into account that treatment assignments are frequently subject to hypothesis testing. In this article, we explicitly take significance testing of the effect of treatment-assignment policies into account, and consider assignments that optimize the probability of finding a subset of individuals with a statistically significant positive treatment effect. We provide an efficient implementation using decision trees, and demonstrate its gain over selecting subsets based on positive (estimated) treatment effects. Compared to standard tree-based regression and classification tools, this approach tends to yield substantially higher power in detecting subgroups with positive treatment effects. INTRODUCTION Recent years have seen the development of machine-learning algorithms that estimate heterogeneous causal effects from randomized controlled trials. While the estimation of average effects - for example, how effective a vaccine is overall, whether a conditional cash transfer reduces poverty, or which ad leads to more clicks - can inform the decision whether to deploy a treatment or not, heterogeneous treatment effect estimation allows us to decide who should get treated. These algorithms aim to maximize realized outcomes, and thus focus on assigning treatment to individuals with positive (estimated) treatment effects. Yet in practice, the deployment of assignment policies often only happens after passing a test that the assignment produces a positive net effect relative to some status quo. For example, a drug manufacturer may have to demonstrate that the drug is effective on the target population by submitting a hypothesis test to the FDA for approval.

Interpretable bias mitigation for textual data: Reducing gender bias in patient notes while maintaining classification performance Machine Learning

Medical systems in general, and patient treatment decisions and outcomes in particular, are affected by bias based on gender and other demographic elements. As language models are increasingly applied to medicine, there is a growing interest in building algorithmic fairness into processes impacting patient care. Much of the work addressing this question has focused on biases encoded in language models -- statistical estimates of the relationships between concepts derived from distant reading of corpora. Building on this work, we investigate how word choices made by healthcare practitioners and language models interact with regards to bias. We identify and remove gendered language from two clinical-note datasets and describe a new debiasing procedure using BERT-based gender classifiers. We show minimal degradation in health condition classification tasks for low- to medium-levels of bias removal via data augmentation. Finally, we compare the bias semantically encoded in the language models with the bias empirically observed in health records. This work outlines an interpretable approach for using data augmentation to identify and reduce the potential for bias in natural language processing pipelines.

Fairness for Unobserved Characteristics: Insights from Technological Impacts on Queer Communities Artificial Intelligence

Advances in algorithmic fairness have largely omitted sexual orientation and gender identity. We explore queer concerns in privacy, censorship, language, online safety, health, and employment to study the positive and negative effects of artificial intelligence on queer communities. These issues underscore the need for new directions in fairness research that take into account a multiplicity of considerations, from privacy preservation, context sensitivity and process fairness, to an awareness of sociotechnical impact and the increasingly important role of inclusive and participatory research processes. Most current approaches for algorithmic fairness assume that the target characteristics for fairness--frequently, race and legal gender--can be observed or recorded. Sexual orientation and gender identity are prototypical instances of unobserved characteristics, which are frequently missing, unknown or fundamentally unmeasurable. This paper highlights the importance of developing new approaches for algorithmic fairness that break away from the prevailing assumption of observed characteristics.