doubleml
Forests for Differences: Robust Causal Inference Beyond Parametric DiD
Souto, Hugo Gobato, Neto, Francisco Louzada
This paper introduces the Difference-in-Differences Bayesian Causal Forest (DiD-BCF), a novel non-parametric model addressing key challenges in DiD estimation, such as staggered adoption and heterogeneous treatment effects. DiD-BCF provides a unified framework for estimating Average (ATE), Group-Average (GATE), and Conditional Average Treatment Effects (CATE). A core innovation, its Parallel Trends Assumption (PTA)-based reparameterization, enhances estimation accuracy and stability in complex panel data settings. Extensive simulations demonstrate DiD-BCF's superior performance over established benchmarks, particularly under non-linearity, selection biases, and effect heterogeneity. Applied to U.S. minimum wage policy, the model uncovers significant conditional treatment effect heterogeneity related to county population, insights obscured by traditional methods. DiD-BCF offers a robust and versatile tool for more nuanced causal inference in modern DiD applications.
DoubleML -- An Object-Oriented Implementation of Double Machine Learning in Python
Bach, Philipp, Chernozhukov, Victor, Kurz, Malte S., Spindler, Martin
DoubleML is an open-source Python library implementing the double machine learning framework of Chernozhukov et al. (2018) for a variety of causal models. It contains functionalities for valid statistical inference on causal parameters when the estimation of nuisance parameters is based on machine learning methods. The object-oriented implementation of DoubleML provides a high flexibility in terms of model specifications and makes it easily extendable. The package is distributed under the MIT license and relies on core libraries from the scientific Python ecosystem: scikit-learn, numpy, pandas, scipy, statsmodels and joblib.
DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R
Bach, Philipp, Chernozhukov, Victor, Kurz, Malte S., Spindler, Martin
Structural equation models provide a quintessential framework for conducting causal inference in statistics, econometrics, machine learning (ML), and other data sciences. The package DoubleML for R (R Core Team, 2020) implements partially linear and interactive structural equation and treatment effect models with high-dimensional confounding variables as considered in Chernozhukov et al. (2018). Estimation and tuning of the machine learning models is based on the powerful functionalities provided by the mlr3 package and the mlr3 ecosystem (Lang et al., 2019). A key element of double machine learning (DML) models are score functions identifying the estimates for the target parameter. These functions play an essential role for valid inference with machine learning methods because they have to satisfy a property called Neyman orthogonality. With the score functions as key elements, DoubleML implements double machine learning in a very general way using object orientation based on the R6 package (Chang, 2020). Currently, DoubleML implements the double / debiased machine learning framework as established in Chernozhukov et al. (2018) for - partially linear regression models (PLR), - partially linear instrumental variable regression models (PLIV), - interactive regression models (IRM), and - interactive instrumental variable regression models (IIVM). The object-oriented implementation of DoubleML is very flexible. The model classes DoubleMLPLR, DoubleMLPLIV, DoubleMLIRM and DoubleIIVM implement the estimation of the nuisance functions via machine learning methods and the computation of the Neyman-orthogonal score function.