This paper proposes an alternative to the synthetic control method (SCM) for estimating the effect of a policy intervention on an outcome over time. Encoder-decoder recurrent neural networks (RNNs) are used to predict counterfactual time-series of treated unit outcomes using only the outcomes of control units as inputs. Unlike SCM, the proposed method does not rely on pre-intervention covariates, allows for nonconvex combinations of control units, and can handle multiple treated units. In empirical and simulated data applications, RNN-based models outperform SCM in terms of predictive accuracy while using much less information to produce counterfactual predictions.
We discuss the relevance of the recent Machine Learning (ML) literature for economics and econometrics. First we discuss the differences in goals, methods and settings between the ML literature and the traditional econometrics and statistics literatures. Then we discuss some specific methods from the machine learning literature that we view as important for empirical researchers in economics. These include supervised learning methods for regression and classification, unsupervised learning methods, as well as matrix completion methods. Finally, we highlight newly developed methods at the intersection of ML and econometrics, methods that typically perform better than either off-the-shelf ML or more traditional econometric methods when applied to particular classes of problems, problems that include causal inference for average treatment effects, optimal policy estimation, and estimation of the counterfactual effect of price changes in consumer choice models.
Estimating treatment effects from observational data is challenging due to the missing counterfactuals. Matching is an effective strategy to tackle this problem. The widely used matching estimators such as nearest neighbor matching (NNM) pair the treated units with the most similar control units in terms of covariates, and then estimate treatment effects accordingly. However, the existing matching estimators have poor performance when the distributions of control and treatment groups are unbalanced. Moreover, theoretical analysis suggests that the bias of causal effect estimation would increase with the dimension of covariates. In this paper, we aim to address these problems by learning low-dimensional balanced and nonlinear representations (BNR) for observational data. In particular, we convert counterfactual prediction as a classification problem, develop a kernel learning model with domain adaptation constraint, and design a novel matching estimator. The dimension of covariates will be significantly reduced after projecting data to a low-dimensional subspace. Experiments on several synthetic and real-world datasets demonstrate the effectiveness of our approach.
When forecasting time series with a hierarchical structure, the existing state of the art is to forecast each time series independently, and, in a post-treatment step, to reconcile the time series in a way that respects the hierarchy (Hyndman et al., 2011; Wickramasuriya et al., 2018). We propose a new loss function that can be incorporated into any maximum likelihood objective with hierarchical data, resulting in reconciled estimates with confidence intervals that correctly account for additional uncertainty due to imperfect reconciliation. We evaluate our method using a non-linear model and synthetic data on a counterfactual forecasting problem, where we have access to the ground truth and contemporaneous covariates, and show that we largely improve over the existing state-of-the-art method.
We propose a method for causal inference using satellite image time series, in order to determine the treatment effects of interventions which impact climate change, such as deforestation. Simply put, the aim is to quantify the 'before versus after' effect of climate related human driven interventions, such as urbanization; as well as natural disasters, such as hurricanes and forest fires. As a concrete example, we focus on quantifying forest tree cover change/ deforestation due to human led causes. The proposed method involves the following steps. First, we uae computer vision and machine learning/deep learning techniques to detect and quantify forest tree coverage levels over time, at every time epoch. We then look at this time series to identify changepoints. Next, we estimate the expected (forest tree cover) values using a Bayesian structural causal model and projecting/forecasting the counterfactual. This is compared to the values actually observed post intervention, and the difference in the two values gives us the effect of the intervention (as compared to the non intervention scenario, i.e. what would have possibly happened without the intervention). As a specific use case, we analyze deforestation levels before and after the hyperinflation event (intervention) in Brazil (which ended in 1993-94), for the Amazon rainforest region, around Rondonia, Brazil. For this deforestation use case, using our causal inference framework can help causally attribute change/reduction in forest tree cover and increasing deforestation rates due to human activities at various points in time.