Goto

Collaborating Authors

 Abadie, Alberto


Doubly Robust Inference in Causal Latent Factor Models

arXiv.org Machine Learning

This article presents a novel framework for the estimation of average treatment effects in modern data-rich environments in the presence of unobserved confounding. Modern data-rich environments are characterized by repeated measurements of outcomes, such as clinical metrics or purchase history, across a substantial number of units--be it patients in medical contexts or customers in online retail. As an example, consider an internet-retail platform where customers interact with various product categories. For each consumer-category pair, the platform makes decisions to either offer a discount or not, and records whether the consumer purchased a product in the category. Given an observational dataset capturing such interactions, our objective is to infer the causal effect of offering the discount on consumer purchase behavior. More specifically, we aim to infer two kinds of treatment effects: (a) tailored to product categories, the average impact of the discount on a product across consumers, and (b) tailored to consumers, the average impact of the discount on a consumer across product categories. This task is challenging due to unobserved confounding that may cause spurious associations between discount allocation and product purchase.


Estimating the Value of Evidence-Based Decision Making

arXiv.org Machine Learning

Business/policy decisions are often based on evidence from randomized experiments and observational studies. In this article we propose an empirical framework to estimate the value of evidence-based decision making (EBDM) and the return on the investment in statistical precision.


The Risk of Machine Learning

arXiv.org Machine Learning

Many applied settings in empirical economics involve simultaneous estimation of a large number of parameters. In particular, applied economists are often interested in estimating the effects of many-valued treatments (like teacher effects or location effects), treatment effects for many groups, and prediction models with many regressors. In these settings, machine learning methods that combine regularized estimation and data-driven choices of regularization parameters are useful to avoid over-fitting. In this article, we analyze the performance of a class of machine learning estimators that includes ridge, lasso and pretest in contexts that require simultaneous estimation of many parameters. Our analysis aims to provide guidance to applied researchers on (i) the choice between regularized estimators in practice and (ii) data-driven selection of regularization parameters. To address (i), we characterize the risk (mean squared error) of regularized estimators and derive their relative performance as a function of simple features of the data generating process. To address (ii), we show that data-driven choices of regularization parameters, based on Stein's unbiased risk estimate or on cross-validation, yield estimators with risk uniformly close to the risk attained under the optimal (unfeasible) choice of regularization parameters. We use data from recent examples in the empirical economics literature to illustrate the practical applicability of our results.