Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls

Doudchenko, Nick, Khosravi, Khashayar, Pouget-Abadie, Jean, Lahaie, Sebastien, Lubin, Miles, Mirrokni, Vahab, Spiess, Jann, Imbens, Guido

arXiv.org Machine Learning 

Randomized experiments have long been a staple of applied causal inference. In his seminal paper, Rubin (1974) suggests that "given a choice between the data from a randomized experiment and an equivalent nonrandomized study, one should choose the data from the experiment, especially in the social sciences where much of the variability is often unassigned to particular causes." Using the language of Rubin's potential-outcomes framework, randomization guarantees that the treatment status is independent of the potential outcomes and that a simple and intuitive estimator that compares the average outcomes of the treatment and control units is an unbiased estimator of the average treatment effect (ATE). If both the treatment and control samples are sufficiently large, the hope is that this difference-in-means estimate is close to the population mean of the treatment effect. Another crucial property of randomized experimental designs is their robustness to alternative assumptions about the data generating process--a completely randomized experiment does not take into account any features of the observed data.