Ask for More Than Bayes Optimal: A Theory of Indecisions for Classification
Ndaoud, Mohamed, Radchenko, Peter, Rava, Bradley
In this work, we address the problem of controlling a classifier's accuracy at any user-specified level through selective classification, regardless of the problem's inherent difficulty. Traditional classification frameworks are designed to approximate the Bayes optimal error rate as closely as possible. However, with the growing deployment of artificial intelligence (AI) systems in automated, high-stakes decision-making, it has become critical to ensure reliable control over a classifier's accuracy and to guarantee accurate predictions for all individuals. When the underlying problem is truly difficult, as indicated by the distance between the true distributions for each decision class, achieving control over the error rate of an automated decisionmaking system may be impossible. This is particularly true when the number of potential classes is large or when the distributions of these classes are close enough, significantly increasing the difficulty of the problem. This phenomenon is illustrated in Figure 1, where the task is to classify various observations as High-Risk or Low-Risk, while maintaining an error rate below 5%. In this example, the High-Risk and Low-Risk classes are modeled as mixtures of two normal distributions with means of 2 and 1, respectively, and a shared variance of 1. The Bayes classifier is represented by the dotted line in the leftmost plot of Figure 1. In this scenario, the Bayes optimal error rate is 15.9%, significantly exceeding our target classification error of 5%.
Dec-17-2024