Collaborating Authors

Machine Learning Methods Economists Should Know About Machine Learning

We discuss the relevance of the recent Machine Learning (ML) literature for economics and econometrics. First we discuss the differences in goals, methods and settings between the ML literature and the traditional econometrics and statistics literatures. Then we discuss some specific methods from the machine learning literature that we view as important for empirical researchers in economics. These include supervised learning methods for regression and classification, unsupervised learning methods, as well as matrix completion methods. Finally, we highlight newly developed methods at the intersection of ML and econometrics, methods that typically perform better than either off-the-shelf ML or more traditional econometric methods when applied to particular classes of problems, problems that include causal inference for average treatment effects, optimal policy estimation, and estimation of the counterfactual effect of price changes in consumer choice models.

Data science is science's second chance to get causal inference right: A classification of data science tasks Machine Learning

Causal inference from observational data is the goal of many health and social scientists. However, academic statistics has often frowned upon data analyses with a causal objective. The advent of data science provides a historical opportunity to redefine data analysis in such a way that it naturally accommodates causal inference from observational data. We argue that the scientific contributions of data science can be organized into three classes of tasks: description, prediction, and causal inference. An explicit classification of data science tasks is necessary to describe the role of subject-matter expert knowledge in data analysis. We discuss the implications of this classification for the use of data to guide decision making in the real world.

A Hierarchy of Limitations in Machine Learning Machine Learning

There is little argument about whether or not machine learning models are useful for applying to social systems. But if we take seriously George Box's dictum, or indeed the even older one that "the map is not the territory' (Korzybski, 1933), then there has been comparatively less systematic attention paid within the field to how machine learning models are wrong (Selbst et al., 2019) and seeing possible harms in that light. By "wrong" I do not mean in terms of making misclassifications, or even fitting over the'wrong' class of functions, but more fundamental mathematical/statistical assumptions, philosophical (in the sense used by Abbott, 1988) commitments about how we represent the world, and sociological processes of how models interact with target phenomena. This paper takes a particular model of machine learning research or application: one that its creators and deployers think provides a reliable way of interacting with the social world (whether that is through understanding, or in making predictions) without any intent to cause harm (McQuillan, 2018) and, in fact, a desire to not cause harm and instead improve the world, 1 for example as most explicitly in the various "{Data [Science], Machine Learning, Artificial Intelligence} for [Social] Good" initiatives, and more widely in framings around "fairness" or "ethics." I focus on the almost entirely statistical modern version of machine learning, rather than eclipsed older visions (see section 3). While many of the limitations I discuss apply to the use of machine learning in any domain, I focus on applications to the social world in order to explore the domain where limitations are strongest and stickiest.

A Survey on Causal Inference Artificial Intelligence

Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well known causal inference framework. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.

The Seven Tools of Causal Inference, with Reflections on Machine Learning

Communications of the ACM

The dramatic success in machine learning has led to an explosion of artificial intelligence (AI) applications and increasing expectations for autonomous systems that exhibit human-level intelligence. These expectations have, however, met with fundamental obstacles that cut across many application areas. One such obstacle is adaptability, or robustness. Machine learning researchers have noted current systems lack the ability to recognize or react to new circumstances they have not been specifically programmed or trained for. Intensive theoretical and experimental efforts toward "transfer learning," "domain adaptation," and "lifelong learning"4 are reflective of this obstacle. Another obstacle is "explainability," or that "machine learning models remain mostly black boxes"26 unable to explain the reasons behind their predictions or recommendations, thus eroding users' trust and impeding diagnosis and repair; see Hutson8 and Marcus.11 A third obstacle concerns the lack of understanding of cause-effect connections.