Plotting

 Chouldechova, Alexandra


Does mitigating ML's impact disparity require treatment disparity?

arXiv.org Machine Learning

Following related work in law and policy, two notions of disparity have come to shape the study of fairness in algorithmic decision-making. Algorithms exhibit treatment disparity if they formally treat members of protected subgroups differently; algorithms exhibit impact disparity when outcomes differ across subgroups, even if the correlation arises unintentionally. Naturally, we can achieve impact parity through purposeful treatment disparity. In one thread of technical work, papers aim to reconcile the two forms of parity proposing disparate learning processes (DLPs). Here, the learning algorithm can see group membership during training but produce a classifier that is group-blind at test time. In this paper, we show theoretically that: (i) When other features correlate to group membership, DLPs will (indirectly) implement treatment disparity, undermining the policy desiderata they are designed to address; (ii) When group membership is partly revealed by other features, DLPs induce within-class discrimination; and (iii) In general, DLPs provide a suboptimal trade-off between accuracy and impact parity. Based on our technical analysis, we argue that transparent treatment disparity is preferable to occluded methods for achieving impact parity. Experimental results on several real-world datasets highlight the practical consequences of applying DLPs vs. per-group thresholds.


Fairer and more accurate, but for whom?

arXiv.org Machine Learning

Complex statistical machine learning models are increasingly being used or considered for use in high-stakes decision-making pipelines in domains such as financial services, health care, criminal justice and human services. These models are often investigated as possible improvements over more classical tools such as regression models or human judgement. While the modeling approach may be new, the practice of using some form of risk assessment to inform decisions is not. When determining whether a new model should be adopted, it is therefore essential to be able to compare the proposed model to the existing approach across a range of task-relevant accuracy and fairness metrics. Looking at overall performance metrics, however, may be misleading. Even when two models have comparable overall performance, they may nevertheless disagree in their classifications on a considerable fraction of cases. In this paper we introduce a model comparison framework for automatically identifying subgroups in which the differences between models are most pronounced. Our primary focus is on identifying subgroups where the models differ in terms of fairness-related quantities such as racial or gender disparities. We present experimental results from a recidivism prediction task and a hypothetical lending example.


Fair prediction with disparate impact: A study of bias in recidivism prediction instruments

arXiv.org Machine Learning

Recidivism prediction instruments (RPI's) provide decision makers with an assessment of the likelihood that a criminal defendant will reoffend at a future point in time. While such instruments are gaining increasing popularity across the country, their use is attracting tremendous controversy. Much of the controversy concerns potential discriminatory bias in the risk assessments that are produced. This paper discusses several fairness criteria that have recently been applied to assess the fairness of recidivism prediction instruments. We demonstrate that the criteria cannot all be simultaneously satisfied when recidivism prevalence differs across groups. We then show how disparate impact can arise when a recidivism prediction instrument fails to satisfy the criterion of error rate balance.


Fair prediction with disparate impact: A study of bias in recidivism prediction instruments

arXiv.org Machine Learning

Recidivism prediction instruments provide decision makers with an assessment of the likelihood that a criminal defendant will reoffend at a future point in time. While such instruments are gaining increasing popularity across the country, their use is attracting tremendous controversy. Much of the controversy concerns potential discriminatory bias in the risk assessments that are produced. This paper discusses a fairness criterion originating in the field of educational and psychological testing that has recently been applied to assess the fairness of recidivism prediction instruments. We demonstrate how adherence to the criterion may lead to considerable disparate impact when recidivism prevalence differs across groups.


Generalized Additive Model Selection

arXiv.org Machine Learning

We introduce GAMSEL (Generalized Additive Model Selection), a penalized likelihood approach for fitting sparse generalized additive models in high dimension. Our method interpolates between null, linear and additive models by allowing the effect of each variable to be estimated as being either zero, linear, or a low-complexity curve, as determined by the data. We present a blockwise coordinate descent procedure for efficiently optimizing the penalized likelihood objective over a dense grid of the tuning parameter, producing a regularization path of additive models. We demonstrate the performance of our method on both real and simulated data examples, and compare it with existing techniques for additive model selection.