Collaborating Authors

Machine Learning Methods Economists Should Know About Machine Learning

We discuss the relevance of the recent Machine Learning (ML) literature for economics and econometrics. First we discuss the differences in goals, methods and settings between the ML literature and the traditional econometrics and statistics literatures. Then we discuss some specific methods from the machine learning literature that we view as important for empirical researchers in economics. These include supervised learning methods for regression and classification, unsupervised learning methods, as well as matrix completion methods. Finally, we highlight newly developed methods at the intersection of ML and econometrics, methods that typically perform better than either off-the-shelf ML or more traditional econometric methods when applied to particular classes of problems, problems that include causal inference for average treatment effects, optimal policy estimation, and estimation of the counterfactual effect of price changes in consumer choice models.

Data science is science's second chance to get causal inference right: A classification of data science tasks Machine Learning

Causal inference from observational data is the goal of many health and social scientists. However, academic statistics has often frowned upon data analyses with a causal objective. The advent of data science provides a historical opportunity to redefine data analysis in such a way that it naturally accommodates causal inference from observational data. We argue that the scientific contributions of data science can be organized into three classes of tasks: description, prediction, and causal inference. An explicit classification of data science tasks is necessary to describe the role of subject-matter expert knowledge in data analysis. We discuss the implications of this classification for the use of data to guide decision making in the real world.

A Hierarchy of Limitations in Machine Learning Machine Learning

There is little argument about whether or not machine learning models are useful for applying to social systems. But if we take seriously George Box's dictum, or indeed the even older one that "the map is not the territory' (Korzybski, 1933), then there has been comparatively less systematic attention paid within the field to how machine learning models are wrong (Selbst et al., 2019) and seeing possible harms in that light. By "wrong" I do not mean in terms of making misclassifications, or even fitting over the'wrong' class of functions, but more fundamental mathematical/statistical assumptions, philosophical (in the sense used by Abbott, 1988) commitments about how we represent the world, and sociological processes of how models interact with target phenomena. This paper takes a particular model of machine learning research or application: one that its creators and deployers think provides a reliable way of interacting with the social world (whether that is through understanding, or in making predictions) without any intent to cause harm (McQuillan, 2018) and, in fact, a desire to not cause harm and instead improve the world, 1 for example as most explicitly in the various "{Data [Science], Machine Learning, Artificial Intelligence} for [Social] Good" initiatives, and more widely in framings around "fairness" or "ethics." I focus on the almost entirely statistical modern version of machine learning, rather than eclipsed older visions (see section 3). While many of the limitations I discuss apply to the use of machine learning in any domain, I focus on applications to the social world in order to explore the domain where limitations are strongest and stickiest.

A Survey on Causal Inference Artificial Intelligence

Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well known causal inference framework. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.

DoWhy: Addressing Challenges in Expressing and Validating Causal Assumptions Artificial Intelligence

Estimation of causal effects involves crucial assumptions about the data-generating process, such as directionality of effect, presence of instrumental variables or mediators, and whether all relevant confounders are observed. Violation of any of these assumptions leads to significant error in the effect estimate. However, unlike cross-validation for predictive models, there is no global validator method for a causal estimate. As a result, expressing different causal assumptions formally and validating them (to the extent possible) becomes critical for any analysis. We present DoWhy, a framework that allows explicit declaration of assumptions through a causal graph and provides multiple validation tests to check a subset of these assumptions. Our experience with DoWhy highlights a number of open questions for future research: developing new ways beyond causal graphs to express assumptions, the role of causal discovery in learning relevant parts of the graph, and developing validation tests that can better detect errors, both for average and conditional treatment effects. DoWhy is available at