Goto

Collaborating Authors

 alice score


AccurateLayerwiseInterpretableCompetence Estimation

Neural Information Processing Systems

Our contributions are twofold: First, we establish a statistically rigorous definition of competence that generalizesthecommon notion ofclassifier confidence; second, wepresent theALICE (Accurate Layerwise Interpretable Competence Estimation) Score, a pointwise competence estimator foranyclassifier.




its semantic meaning: for all points with ALICE score of p, we expect p

Neural Information Processing Systems

The authors would like to thank the reviewers for their thoughtful comments. We have replaced the calibration experiment. Figure 1: ALICE score calibration of ResNet32 trained on CIFAR10. At 50 epochs we reach max validation accuracy. Full experimental details are in the final version.


Accurate Layerwise Interpretable Competence Estimation

Rajendran, Vickram, LeVine, William

arXiv.org Machine Learning

Estimating machine learning performance 'in the wild' is both an important and unsolved problem. In this paper, we seek to examine, understand, and predict the pointwise competence of classification models. Our contributions are twofold: First, we establish a statistically rigorous definition of competence that generalizes the common notion of classifier confidence; second, we present the ALICE (Accurate Layerwise Interpretable Competence Estimation) Score, a pointwise competence estimator for any classifier. By considering distributional, data, and model uncertainty, ALICE empirically shows accurate competence estimation in common failure situations such as class-imbalanced datasets, out-of-distribution datasets, and poorly trained models. Our contributions allow us to accurately predict the competence of any classification model given any input and error function. We compare our score with state-of-the-art confidence estimators such as model confidence and Trust Score, and show significant improvements in competence prediction over these methods on datasets such as DIGITS, CIFAR10, and CIFAR100.