On the Interconnections of Calibration, Quantification, and Classifier Accuracy Prediction under Dataset Shift

Moreo, Alejandro

arXiv.org Artificial Intelligence 

Classifiers are often deployed in contexts in which the independent and identically distributed (IID) assumption is violated, i.e., in which the data used to train the model and the future data to be classified are not drawn from the same distribution. This situation is generally referred to as dataset shift in the machine learning literature [Storkey, 2009]. In this context, three problems have gained increased attention in the last years. Classifier calibration [Flach and Webb, 2016, Silva Filho et al., 2023] concerns the manipulation of the confidence scores produced by a classifier so that these effectively reflect the likelihood that a given instance is positive. Quantification [Gonz alez et al., 2017, Esuli et al., 2023] is instead concerned with estimating the prevalence of the classes of interest in an unlabelled set. Finally, classifier accuracy prediction aims at inferring how well a classifier will fare on unseen data [Elsahar and Gall e, 2019, Guillory et al., 2021]. Well-established procedures for attaining these three goals when the IID assumption holds are known and routinely used. For instance, calibrating the classifier's outputs can be attained by learning a calibration map (a function mapping classifier confidence scores into values reflecting the likelihood of the positive class) on held-out validation data [Platt, 2000, Zadrozny and Elkan, 2001a, Barlow and Brunk, 1972].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found