Dataset Fairness: Achievable Fairness on Your Data With Utility Guarantees

Taufiq, Muhammad Faaiz, Ton, Jean-Francois, Liu, Yang

arXiv.org Machine Learning 

One of the key challenges in fairness for machine learning is to train models that minimize the disparity across various sensitive groups such as race or gender [Caton and Haas, 2020, Ustun et al., 2019, Celis et al., 2019]. This often comes at the cost of reduced model accuracy, a phenomenon termed accuracyfairness trade-off in literature [Valdivia et al., 2021, Martinez et al., 2020]. This trade-off can differ significantly across datasets in practice, depending on factors such as dataset biases, imbalances etc. [Agarwal et al., 2018, Bendekgey and Sudderth, 2021, Celis et al., 2021]. To demonstrate how these trade-offs are inherently dataset-dependent, let's consider a simple example involving two distinct crime datasets. Dataset A has records from a community where crime rates are uniformly distributed across all racial groups, whereas Dataset B comes from a community where historical factors have resulted in a disproportionate crime rate among a specific racial group. Intuitively, training models which are racially agnostic is more challenging for Dataset B, due to the unequal distribution of crime rates across racial groups, and will result in a greater loss in model accuracy as compared to Dataset A. This example underscores that setting a uniform fairness requirement across diverse datasets (such as requiring the fairness violation metric to be below 10% for both datasets), while also adhering to essential accuracy benchmarks is impractical.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found