On (assessing) the fairness of risk score models

Petersen, Eike, Ganz, Melanie, Holm, Sune Hannibal, Feragen, Aasa

arXiv.org Artificial Intelligence 

To date, much of the algorithmic fairness literature has focused on the fairness of classification systems which are used, for example, to decide whether a person should be granted a loan or be released from prison on bail. Even in cases where such classification decisions are based on risk score models - such as in the highly influential COMPAS case [5, 11, 16] - their fairness is typically considered a function of the decisions, or classifications, made by the system. Of course, any risk score model can be turned into a classifier by selecting a probability threshold (in binary classification) or predicting the most likely outcome (in multi-class classification). Nevertheless, we argue here that it is worthwhile to distinguish between these two settings and consider the fairness of risk models independent of their downstream use, be it as the basis for a classifier or otherwise. We discuss notions of fairness for risk scores as well as their relationship to classical, classification-level notions of fairness, and we develop robust tools to empirically quantify risk score fairness. We illustrate our methodology in two case studies, one situated in the criminal justice system and one in healthcare. Why distinguish between fair models and fair decisions? In the statistical literature, it is generally considered desirable to distinguish between inference (e.g., identifying a risk score model) and subsequent decision-making (e.g., deriving a classification from a risk score model): while the former represents a purely statistical task, the latter depends on subjective

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found