Selective Classification Via Neural Network Training Dynamics

Rabanser, Stephan, Thudi, Anvith, Hamidieh, Kimia, Dziedzic, Adam, Papernot, Nicolas

arXiv.org Artificial Intelligence 

Machine learning (ML) is increasingly deployed in high-stakes decision-making environments, where it is critical to detect inputs that the model could misclassify. This is particularly true when deploying deep neural networks (DNNs) for applications with low tolerances for false-positives (i.e., classifying with a wrong label), such as healthcare (Challen et al., 2019; Mozannar and Sontag, 2020), self-driving (Ghodsi et al., 2021), and law (Vieira et al., 2021). This problem setup is captured by the selective classification (SC) framework, which introduces a gating mechanism to abstain from predicting on individual test points (Geifman and El-Yaniv, 2017). Specifically, SC aims to (i) only accept inputs on which the ML model would achieve high accuracy, while (ii) maintaining high coverage, i.e., accepting as many inputs as possible. Current selective classification techniques take one of two directions: (i) augmentation of the architecture of the underlying ML model (Geifman and El-Yaniv, 2019); or (ii) training the model using a purposefully adapted loss function (Gangrade et al., 2021). The unifying principle behind these methods is to modify the training stage in order to accommodate selective classification. In this work, we instead show that these modifications are unnecessary. That is, our method not only matches or outperforms existing work but our method is the only state-of-the-art (SOTA) approach that can be applied to all existing models. Our approach builds on the following observation: when we sequentially optimize a model for one dataset there is in fact a larger set of datapoints the model also sequentially optimized (Hardt et al., 2016; Bassily et al., 2020; Thudi et al., 2022).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found