isotonic 0
- North America > United States > California (0.14)
- North America > United States > Texas (0.04)
- North America > Greenland (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- North America > United States > California (0.14)
- North America > United States > Texas (0.04)
- North America > Greenland (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
When is Multicalibration Post-Processing Necessary?
Hansen, Dutch, Devic, Siddartha, Nakkiran, Preetum, Sharan, Vatsal
A popular approach to ensuring that probabilistic predictions from machine learning algorithms are meaningful is model calibration. Intuitively, calibration requires that amongst all samples given score p [0, 1] by an ML algorithm, exactly a p-fraction of those samples have positive label. Calibration ensures that a predictor has an accurate estimate of its own predictive uncertainty, and is a fundamental requirement in applications where probabilities may be taken into account for high-stake decisions such as disease diagnosis (Dahabreh et al., 2017) or credit/lending decisions (Bequé et al., 2017). Miscalibration can result in undesirable downstream consequences when probabilistic predictions are thresholded into decisions: if a predictor has high calibration error in disease diagnosis, for example, the individuals assigned lower predicted probabilities may be unfairly denied treatment. Calibration has a long history in the machine learning community (Guo et al., 2017; Minderer et al., 2021; Niculescu-Mizil and Caruana, 2005; Platt et al., 1999), but was arguably first introduced in fairness contexts by Cleary (1968). More recently, it has appeared in the algorithmic fairness community via the seminal works of Chouldechova (2017); Kleinberg et al. (2017). Although calibration ensures meaningful uncertainty estimates aggregated over the entire population, it does not preclude potential discrimination at the level of groups of individuals: a model may be well calibrated overall but systematically underestimate the risk or qualification probability on historically underrepresented subsets of individuals. For example, Obermeyer et al. (2019) show differing calibration error rates across groups defined by race for prediction in high-risk patient care management systems. As pointed out by Obermeyer et al. (2019), in the
- North America > United States > California (0.14)
- North America > United States > Texas (0.04)
- North America > Greenland (0.04)
- (3 more...)
- Health & Medicine (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Education > Educational Setting (0.67)
Class-wise and reduced calibration methods
Panchenko, Michael, Benmerzoug, Anes, Delgado, Miguel de Benito
For many applications of probabilistic classifiers it is important that the predicted confidence vectors reflect true probabilities (one says that the classifier is calibrated). It has been shown that common models fail to satisfy this property, making reliable methods for measuring and improving calibration important tools. Unfortunately, obtaining these is far from trivial for problems with many classes. We propose two techniques that can be used in tandem. First, a reduced calibration method transforms the original problem into a simpler one. We prove for several notions of calibration that solving the reduced problem minimizes the corresponding notion of miscalibration in the full problem, allowing the use of non-parametric recalibration methods that fail in higher dimensions. Second, we propose class-wise calibration methods, based on intuition building on a phenomenon called neural collapse and the observation that most of the accurate classifiers found in practice can be thought of as a union of K different functions which can be recalibrated separately, one for each class. These typically out-perform their non class-wise counterparts, especially for classifiers trained on imbalanced data sets. Applying the two methods together results in class-wise reduced calibration algorithms, which are powerful tools for reducing the prediction and per-class calibration errors. We demonstrate our methods on real and synthetic datasets and release all code as open source at https://github.com/appliedAI-Initiative
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
- Africa > Middle East > Tunisia > Tunis Governorate > Tunis (0.04)