Reviews: To Trust Or Not To Trust A Classifier

Neural Information Processing Systems 

This paper proposes a "trust" score that is supposed to reliably identify whether a prediction is correct or not. The main intuition is that the prediction is more reliable if the instance is closer to the predicted class than other classes. By defining alpha-high-density-set, this paper is able to provide theoretical guarantees of the proposed algorithm, which is the main strength of this paper. This paper proceeds to evaluate the "trust" score using a variety of datasets. One cool experiment is to show that the "trust" score estimated from deeper layers than lower layers.