Collaborating Authors

Estimating heterogeneous treatment effects with right-censored data via causal survival forests Machine Learning

There is fast-growing literature on estimating heterogeneous treatment effects via random forests in observational studies. However, there are few approaches available for right-censored survival data. In clinical trials, right-censored survival data are frequently encountered. Quantifying the causal relationship between a treatment and the survival outcome is of great interest. Random forests provide a robust, nonparametric approach to statistical estimation. In addition, recent developments allow forest-based methods to quantify the uncertainty of the estimated heterogeneous treatment effects. We propose causal survival forests that directly target on estimating the treatment effect from an observational study. We establish consistency and asymptotic normality of the proposed estimators and provide an estimator of the asymptotic variance that enables valid confidence intervals of the estimated treatment effect. The performance of our approach is demonstrated via extensive simulations and data from an HIV study.

High-dimensional regression adjustments in randomized experiments Machine Learning

We study the problem of treatment effect estimation in randomized experiments with high-dimensional covariate information, and show that essentially any risk-consistent regression adjustment can be used to obtain efficient estimates of the average treatment effect. Our results considerably extend the range of settings where high-dimensional regression adjustments are guaranteed to provide valid inference about the population average treatment effect. We then propose cross-estimation, a simple method for obtaining finite-sample-unbiased treatment effect estimates that leverages high-dimensional regression adjustments. Our method can be used when the regression model is estimated using the lasso, the elastic net, subset selection, etc. Finally, we extend our analysis to allow for adaptive specification search via cross-validation, and flexible non-parametric regression adjustments with machine learning methods such as random forests or neural networks.

Estimating Bayesian Optimal Treatment Regimes for Dichotomous Outcomes using Observational Data Machine Learning

Optimal treatment regimes (OTR) are individualised treatment assignment strategies that identify a medical treatment as optimal given all background information available on the individual. We discuss Bayes optimal treatment regimes estimated using a loss function defined on the bivariate distribution of dichotomous potential outcomes. The proposed approach allows considering more general objectives for the OTR than maximization of an expected outcome (e.g., survival probability) by taking into account, for example, unnecessary treatment burden. As a motivating example we consider the case of oropharynx cancer treatment where unnecessary burden due to chemotherapy is to be avoided while maximizing survival chances. Assuming ignorable treatment assignment we describe Bayesian inference about the OTR including a sensitivity analysis on the unobserved partial association of the potential outcomes. We evaluate the methodology by simulations that apply Bayesian parametric and more flexible non-parametric outcome models. The proposed OTR for oropharynx cancer reduces the frequency of the more burdensome chemotherapy assignment by approximately 75% without reducing the average survival probability. This regime thus offers a strong increase in expected quality of life of patients.

Machine Learning Methods Economists Should Know About Machine Learning

We discuss the relevance of the recent Machine Learning (ML) literature for economics and econometrics. First we discuss the differences in goals, methods and settings between the ML literature and the traditional econometrics and statistics literatures. Then we discuss some specific methods from the machine learning literature that we view as important for empirical researchers in economics. These include supervised learning methods for regression and classification, unsupervised learning methods, as well as matrix completion methods. Finally, we highlight newly developed methods at the intersection of ML and econometrics, methods that typically perform better than either off-the-shelf ML or more traditional econometric methods when applied to particular classes of problems, problems that include causal inference for average treatment effects, optimal policy estimation, and estimation of the counterfactual effect of price changes in consumer choice models.

Generalized Random Forests Machine Learning

We propose generalized random forests, a method for non-parametric statistical estimation based on random forests (Breiman, 2001) that can be used to fit any quantity of interest identified as the solution to a set of local moment equations. Following the literature on local maximum likelihood estimation, our method operates at a particular point in covariate space by considering a weighted set of nearby training examples; however, instead of using classical kernel weighting functions that are prone to a strong curse of dimensionality, we use an adaptive weighting function derived from a forest designed to express heterogeneity in the specified quantity of interest. We propose a flexible, computationally efficient algorithm for growing generalized random forests, develop a large sample theory for our method showing that our estimates are consistent and asymptotically Gaussian, and provide an estimator for their asymptotic variance that enables valid confidence intervals. We use our approach to develop new methods for three statistical tasks: non-parametric quantile regression, conditional average partial effect estimation, and heterogeneous treatment effect estimation via instrumental variables. A software implementation, grf for R and C++, is available from CRAN.