Confusion Matrices and Accuracy Statistics for Binary Classifiers Using Unlabeled Data: The Diagnostic Test Approach
–arXiv.org Artificial Intelligence
Sometimes it is important to know the accuracy of a classifier on unlabeled data. The labels may be delayed, as in consumer purchasing predictions, or obtaining the labels is cost prohibitive. The labels may not exist, as for some medical conditions, for which the true gold standard diagnostic test(a 100% sensitive and 100% specific classifier) would require subjects be euthanized and autopsied to obtain labels. Epidemiologists and biostatisticians have developed statistical methods for assessing the sensitivity (Se) and specificity (Sp) of diagnostic tests when gold standard comparison tests are unavailable. In data science terms, the diagnostic test assessment data are unlabeled. In this article, I describe how to modify those diagnostic test statistical methods to estimate confusion matrices and accuracy statistics for binary classifiers.
arXiv.org Artificial Intelligence
Dec-27-2022
- Country:
- North America > United States > Minnesota (0.04)
- Genre:
- Research Report (0.40)
- Industry:
- Health & Medicine > Health Care Technology (1.00)
- Technology: