Confusion Matrices and Accuracy Statistics for Binary Classifiers Using Unlabeled Data: The Diagnostic Test Approach

Evans, Richard

arXiv.org Artificial Intelligence 

Sometimes it is important to know the accuracy of a classifier on unlabeled data. The labels may be delayed, as in consumer purchasing predictions, or obtaining the labels is cost prohibitive. The labels may not exist, as for some medical conditions, for which the true gold standard diagnostic test(a 100% sensitive and 100% specific classifier) would require subjects be euthanized and autopsied to obtain labels. Epidemiologists and biostatisticians have developed statistical methods for assessing the sensitivity (Se) and specificity (Sp) of diagnostic tests when gold standard comparison tests are unavailable. In data science terms, the diagnostic test assessment data are unlabeled. In this article, I describe how to modify those diagnostic test statistical methods to estimate confusion matrices and accuracy statistics for binary classifiers.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found