Understanding challenges to the interpretation of disaggregated evaluations of algorithmic fairness

Pfohl, Stephen R., Harris, Natalie, Nagpal, Chirag, Madras, David, Mhasawade, Vishwali, Salaudeen, Olawale, Dieng, Awa, Sequeira, Shannon, Arciniegas, Santiago, Sung, Lillian, Ezeanochie, Nnamdi, Cole-Lewis, Heather, Heller, Katherine, Koyejo, Sanmi, D'Amour, Alexander

Jun-5-2025–arXiv.org Machine Learning

Disaggregated evaluation across subgroups is critical for assessing the fairness of machine learning models, but its uncritical use can mislead practitioners. We show that equal performance across subgroups is an unreliable measure of fairness when data are representative of the relevant populations but reflective of real-world disparities. Furthermore, when data are not representative due to selection bias, both disaggregated evaluation and alternative approaches based on conditional independence testing may be invalid without explicit assumptions regarding the bias mechanism. We use causal graphical models to predict metric stability across subgroups under different data generating processes. Our framework suggests complementing disaggregated evaluations with explicit causal assumptions and analysis to control for confounding and distribution shift, including conditional independence testing and weighted performance estimation. These findings have broad implications for how practitioners design and interpret model assessments given the ubiquity of disaggregated evaluation.

data mining, machine learning, prediction, (17 more...)

arXiv.org Machine Learning

Jun-5-2025

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America
  - United States
    - Alaska (0.04)
    - Wisconsin > Dane County
      - Madison (0.04)
    - New York > New York County
      - New York City (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
    - California > Santa Clara County
      - Mountain View (0.04)
      - Stanford (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - France (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)

Genre:
- Research Report > New Finding (0.30)

Industry:
- Health & Medicine
  - Therapeutic Area > Cardiology/Vascular Diseases (0.68)
  - Diagnostic Medicine > Imaging (0.67)

Technology:
- Information Technology
  - Data Science > Data Mining (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning > Performance Analysis
      - Accuracy (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found