Critical Appraisal of Fairness Metrics in Clinical Predictive AI
Matos, João, Van Calster, Ben, Celi, Leo Anthony, Dhiman, Paula, Gichoya, Judy Wawira, Riley, Richard D., Russell, Chris, Khalid, Sara, Collins, Gary S.
–arXiv.org Artificial Intelligence
Predictive artificial intelligence (AI) offers an opportunity to improve clinical practice and patient outcomes, but risks perpetuating biases if fairness is inadequately addressed. However, the definition of "fairness" remains unclear. We conducted a scoping review to identify and critically appraise fairness metrics for clinical predictive AI. We defined a "fairness metric" as a measure quantifying whether a model discriminates (societally) against individuals or groups defined by sensitive attributes. We searched five databases (2014-2024), screening 820 records, to include 41 studies, and extracted 62 fairness metrics. Metrics were classified by performance-dependency, model output level, and base performance metric, revealing a fragmented landscape with limited clinical validation and overreliance on threshold-dependent measures. Eighteen metrics were explicitly developed for healthcare, including only one clinical utility metric. Our findings highlight conceptual challenges in defining and quantifying fairness and identify gaps in uncertainty quantification, intersectionality, and real-world applicability.
arXiv.org Artificial Intelligence
Jun-23-2025
- Country:
- Africa (0.04)
- North America > United States
- Massachusetts
- Suffolk County > Boston (0.04)
- Middlesex County > Cambridge (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Massachusetts
- Europe
- United Kingdom > England
- Oxfordshire > Oxford (0.14)
- West Midlands > Birmingham (0.04)
- Leicestershire > Leicester (0.04)
- Belgium > Flanders
- Flemish Brabant > Leuven (0.04)
- United Kingdom > England
- Asia > Middle East
- Israel (0.04)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.87)
- Research Report
- Industry:
- Technology: