When is Multicalibration Post-Processing Necessary?
Hansen, Dutch, Devic, Siddartha, Nakkiran, Preetum, Sharan, Vatsal
–arXiv.org Artificial Intelligence
A popular approach to ensuring that probabilistic predictions from machine learning algorithms are meaningful is model calibration. Intuitively, calibration requires that amongst all samples given score p [0, 1] by an ML algorithm, exactly a p-fraction of those samples have positive label. Calibration ensures that a predictor has an accurate estimate of its own predictive uncertainty, and is a fundamental requirement in applications where probabilities may be taken into account for high-stake decisions such as disease diagnosis (Dahabreh et al., 2017) or credit/lending decisions (Bequé et al., 2017). Miscalibration can result in undesirable downstream consequences when probabilistic predictions are thresholded into decisions: if a predictor has high calibration error in disease diagnosis, for example, the individuals assigned lower predicted probabilities may be unfairly denied treatment. Calibration has a long history in the machine learning community (Guo et al., 2017; Minderer et al., 2021; Niculescu-Mizil and Caruana, 2005; Platt et al., 1999), but was arguably first introduced in fairness contexts by Cleary (1968). More recently, it has appeared in the algorithmic fairness community via the seminal works of Chouldechova (2017); Kleinberg et al. (2017). Although calibration ensures meaningful uncertainty estimates aggregated over the entire population, it does not preclude potential discrimination at the level of groups of individuals: a model may be well calibrated overall but systematically underestimate the risk or qualification probability on historically underrepresented subsets of individuals. For example, Obermeyer et al. (2019) show differing calibration error rates across groups defined by race for prediction in high-risk patient care management systems. As pointed out by Obermeyer et al. (2019), in the
arXiv.org Artificial Intelligence
Jun-10-2024
- Country:
- North America > United States > California (0.14)
- Genre:
- Research Report > New Finding (0.93)
- Industry:
- Technology: