Dataset Fairness: Achievable Fairness on Your Data With Utility Guarantees

Taufiq, Muhammad Faaiz, Ton, Jean-Francois, Liu, Yang

Feb-26-2024–arXiv.org Machine Learning

One of the key challenges in fairness for machine learning is to train models that minimize the disparity across various sensitive groups such as race or gender [Caton and Haas, 2020, Ustun et al., 2019, Celis et al., 2019]. This often comes at the cost of reduced model accuracy, a phenomenon termed accuracyfairness trade-off in literature [Valdivia et al., 2021, Martinez et al., 2020]. This trade-off can differ significantly across datasets in practice, depending on factors such as dataset biases, imbalances etc. [Agarwal et al., 2018, Bendekgey and Sudderth, 2021, Celis et al., 2021]. To demonstrate how these trade-offs are inherently dataset-dependent, let's consider a simple example involving two distinct crime datasets. Dataset A has records from a community where crime rates are uniformly distributed across all racial groups, whereas Dataset B comes from a community where historical factors have resulted in a disproportionate crime rate among a specific racial group. Intuitively, training models which are racially agnostic is more challenging for Dataset B, due to the unequal distribution of crime rates across racial groups, and will result in a greater loss in model accuracy as compared to Dataset A. This example underscores that setting a uniform fairness requirement across diverse datasets (such as requiring the fairness violation metric to be below 10% for both datasets), while also adhering to essential accuracy benchmarks is impractical.

artificial intelligence, machine learning, trade-off, (16 more...)

arXiv.org Machine Learning

Feb-26-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.46)

Genre:
- Research Report > New Finding (0.93)

Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.88)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning
    - Neural Networks (1.00)
    - Performance Analysis > Accuracy (0.68)
    - Statistical Learning (1.00)
  - Data Science (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found