Learning Conditional Average Treatment Effects in Regression Discontinuity Designs using Bayesian Additive Regression Trees

Alcantara, Rafael, Hahn, P. Richard, Carvalho, Carlos, Lopes, Hedibert

arXiv.org Machine Learning 

Such designs arise when treatment assignment is based on whether a particular covariate -- referred to as the running variable -- lies above or below a known value, referred to as the cutoff value. Because treatment is deterministically assigned as a known function of the running variable, RDDs are trivially deconfounded: treatment assignment is independent of the outcome variable, given the running variable (because treatment is conditionally constant). However, estimation of treatment effects in RDDs is more complicated than simply controlling for the running variable, because doing so introduces a complete lack of overlap, which is the other key condition needed to justify regression adjustment for causal inference. Nonetheless, treatment effects at the cutoff may still be identified. Specifically, it is well-known that treatment effects at the cutoff can be estimated from RDDs as the magnitude of a discontinuity in the conditional mean response function at that point (Hahn et al., 2001). This paper investigates the use of Bayesian additive regression tree models (Chipman et al., 2010; Hahn et al., 2020) for the purpose of estimating conditional average treatments effects (CATE) at the cutoff, conditional on observed covariates other than the running variable. To the best of our knowledge, such data-driven CATE estimation has not been a focus of the existing RDD literature and we are the first to propose BART for this purpose.