Regulatory bodies around the world increasingly recognize that they need to regulate how governments use machine learning algorithms when making high-stakes decisions. This is a welcome development, but current approaches fall short. As regulators develop policies, they must consider how human decisionmakers interact with algorithms. If they do not, regulations will provide a false sense of security in governments adopting algorithms. In recent years, researchers and journalists have exposed how algorithmic systems used by courts, police, education departments, welfare agencies and other government bodies are rife with errors and biases.
As algorithmic risk assessment instruments (RAIs) are increasingly adopted to assist decision makers, their predictive performance and potential to promote inequity have come under scrutiny. However, while most studies examine these tools in isolation, researchers have come to recognize that assessing their impact requires understanding the behavior of their human interactants. In this paper, building off of several recent crowdsourcing works focused on criminal justice, we conduct a vignette study in which laypersons are tasked with predicting future re-arrests. Our key findings are as follows: (1) Participants often predict that an offender will be rearrested even when they deem the likelihood of re-arrest to be well below 50%; (2) Participants do not anchor on the RAI's predictions; (3) The time spent on the survey varies widely across participants and most cases are assessed in less than 10 seconds; (4) Judicial decisions, unlike participants' predictions, depend in part on factors that are orthogonal to the likelihood of re-arrest. These results highlight the influence of several crucial but often overlooked design decisions and concerns around generalizability when constructing crowdsourcing studies to analyze the impacts of RAIs.
Despite the recent surge of interest in designing and guaranteeing mathematical formulations of fairness, virtually all existing notions of algorithmic fairness fail to be adaptable to the intricacies and nuances of the decision-making context at hand. We argue that capturing such factors is an inherently human task, as it requires knowledge of the social background in which machine learning tools impact real people's outcomes and a deep understanding of the ramifications of automated decisions for decision subjects and society. In this work, we present a framework to construct a context-dependent mathematical formulation of fairness utilizing people's judgment of fairness. We utilize the theoretical model of Heidari et al. (2019)---which shows that most existing formulations of algorithmic fairness are special cases of economic models of Equality of Opportunity (EOP)---and present a practical human-in-the-loop approach to pinpoint the fairness notion in the EOP family that best captures people's perception of fairness in the given context. To illustrate our framework, we run human-subject experiments designed to learn the parameters of Heidari et al.'s EOP model (including circumstance, desert, and utility) in a hypothetical recidivism decision-making scenario. Our work takes an initial step toward democratizing the formulation of fairness and utilizing human-judgment to tackle a fundamental shortcoming of automated decision-making systems: that the machine on its own is incapable of understanding and processing the human aspects and social context of its decisions.
San Francisco, CA, April 26, 2019 – The Partnership on AI (PAI) has today published a report gathering the views of the multidisciplinary artificial intelligence and machine learning research and ethics community which documents the serious shortcomings of algorithmic risk assessment tools in the U.S. criminal justice system. These kinds of AI tools for deciding on whether to detain or release defendants are in widespread use around the United States, and some legislatures have begun to mandate their use. Lessons drawn from the U.S. context have widespread applicability in other jurisdictions, too, as the international policymaking community considers the deployment of similar tools. While criminal justice risk assessment tools are often simpler than the deep neural networks used in many modern artificial intelligence systems, they are basic forms of AI. As such, they present a paradigmatic example of the high-stakes social and ethical consequences of automated AI decision-making.
Recidivism prediction instruments provide decision makers with an assessment of the likelihood that a criminal defendant will reoffend at a future point in time. While such instruments are gaining increasing popularity across the country, their use is attracting tremendous controversy. Much of the controversy concerns potential discriminatory bias in the risk assessments that are produced. This paper discusses a fairness criterion originating in the field of educational and psychological testing that has recently been applied to assess the fairness of recidivism prediction instruments. We demonstrate how adherence to the criterion may lead to considerable disparate impact when recidivism prevalence differs across groups.