set-valued classifier
Set to Be Fair: Demographic Parity Constraints for Set-Valued Classification
Cohen, Eyal, Denis, Christophe, Hebiri, Mohamed
Set-valued classification is used in multiclass settings where confusion between classes can occur and lead to misleading predictions. However, its application may amplify discriminatory bias motivating the development of set-valued approaches under fairness constraints. In this paper, we address the problem of set-valued classification under demographic parity and expected size constraints. We propose two complementary strategies: an oracle-based method that minimizes classification risk while satisfying both constraints, and a computationally efficient proxy that prioritizes constraint satisfaction. For both strategies, we derive closed-form expressions for the (optimal) fair set-valued classifiers and use these to build plug-in, data-driven procedures for empirical predictions. We establish distribution-free convergence rates for violations of the size and fairness constraints for both methods, and under mild assumptions we also provide excess-risk bounds for the oracle-based approach. Empirical results demonstrate the effectiveness of both strategies and highlight the efficiency of our proxy method.
Learning Acceptance Regions for Many Classes with Anomaly Detection
In multicategory classification, traditional methods return a single class label as the prediction without a confidence measure attached. For points near the classification boundary where the classes overlap, these methods may misclassify with high probability. As classification and machine learning in general have played a more and more significant role in high stake domains, these mistakes can incur severe consequences. To avoid making mistakes when they are likely to happen, set-valued classification methods have emerged (Herbei and Wegkamp, 2006; Shafer and Vovk, 2008; Dümbgen et al., 2008; Denis and Hebiri, 2017; Wang and Qiao, 2018; Zhang et al., 2018; Sadinle et al., 2019). A set-valued classifier may return multiple class labels as the prediction for each observation. Specifically, those near the boundary between classes may receive multiple labels as the prediction. Herbei and Wegkamp (2006), Bartlett and Wegkamp (2008), Ramaswamy et al. (2015) and Zhang et al. (2018) proposed and developed Classification with a Reject Option (CRO) by training a classifier and a rejector at the same time. A rejector determines when to refuse to make a classification for ambiguous points (i.e.
Set-valued classification -- overview via a unified framework
Chzhen, Evgenii, Denis, Christophe, Hebiri, Mohamed, Lorieul, Titouan
Multi-class classification problem is among the most popular and well-studied statistical frameworks. Modern multi-class datasets can be extremely ambiguous and single-output predictions fail to deliver satisfactory performance. By allowing predictors to predict a set of label candidates, set-valued classification offers a natural way to deal with this ambiguity. Several formulations of set-valued classification are available in the literature and each of them leads to different prediction strategies. The present survey aims to review popular formulations using a unified statistical framework. The proposed framework encompasses previously considered and leads to new formulations as well as it allows to understand underlying trade-offs of each formulation. We provide infinite sample optimal set-valued classification strategies and review a general plug-in principle to construct data-driven algorithms. The exposition is supported by examples and pointers to both theoretical and practical contributions. Finally, we provide experiments on real-world datasets comparing these approaches in practice and providing general practical guidelines.
Least Ambiguous Set-Valued Classifiers with Bounded Error Levels
Sadinle, Mauricio, Lei, Jing, Wasserman, Larry
In most classification tasks there are observations that are ambiguous and therefore difficult to correctly label. Set-valued classification allows the classifiers to output a set of plausible labels rather than a single label, thereby giving a more appropriate and informative treatment to the labeling of ambiguous instances. We introduce a framework for multiclass set-valued classification, where the classifiers guarantee user-defined levels of coverage or confidence (the probability that the true label is contained in the set) while minimizing the ambiguity (the expected size of the output). We first derive oracle classifiers assuming the true distribution to be known. We show that the oracle classifiers are obtained from level sets of the functions that define the conditional probability of each class. Then we develop estimators with good asymptotic and finite sample properties. The proposed classifiers build on and refine many existing single-label classifiers. The optimal classifier can sometimes output the empty set. We provide two solutions to fix this issue that are suitable for various practical needs.