On the Interconnections of Calibration, Quantification, and Classifier Accuracy Prediction under Dataset Shift
–arXiv.org Artificial Intelligence
Classifiers are often deployed in contexts in which the independent and identically distributed (IID) assumption is violated, i.e., in which the data used to train the model and the future data to be classified are not drawn from the same distribution. This situation is generally referred to as dataset shift in the machine learning literature [Storkey, 2009]. In this context, three problems have gained increased attention in the last years. Classifier calibration [Flach and Webb, 2016, Silva Filho et al., 2023] concerns the manipulation of the confidence scores produced by a classifier so that these effectively reflect the likelihood that a given instance is positive. Quantification [Gonz alez et al., 2017, Esuli et al., 2023] is instead concerned with estimating the prevalence of the classes of interest in an unlabelled set. Finally, classifier accuracy prediction aims at inferring how well a classifier will fare on unseen data [Elsahar and Gall e, 2019, Guillory et al., 2021]. Well-established procedures for attaining these three goals when the IID assumption holds are known and routinely used. For instance, calibrating the classifier's outputs can be attained by learning a calibration map (a function mapping classifier confidence scores into values reflecting the likelihood of the positive class) on held-out validation data [Platt, 2000, Zadrozny and Elkan, 2001a, Barlow and Brunk, 1972].
arXiv.org Artificial Intelligence
May-19-2025
- Country:
- North America
- United States
- Oregon > Multnomah County
- Portland (0.04)
- Massachusetts
- Suffolk County > Boston (0.04)
- Middlesex County > Cambridge (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Oregon > Multnomah County
- Canada > Quebec
- Montreal (0.04)
- United States
- Europe
- Sweden > Stockholm
- Stockholm (0.04)
- Lithuania > Vilnius County
- Vilnius (0.04)
- Italy
- Tuscany > Pisa Province
- Pisa (0.04)
- Emilia-Romagna > Metropolitan City of Bologna
- Bologna (0.04)
- Tuscany > Pisa Province
- France > Auvergne-Rhône-Alpes
- Sweden > Stockholm
- Asia
- Middle East > Jordan (0.04)
- China > Hong Kong (0.04)
- North America
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Health & Medicine (0.67)
- Technology:
- Information Technology
- Data Science > Data Mining (1.00)
- Artificial Intelligence
- Representation & Reasoning (1.00)
- Natural Language (1.00)
- Machine Learning
- Statistical Learning (1.00)
- Performance Analysis > Accuracy (1.00)
- Neural Networks (0.68)
- Learning Graphical Models > Directed Networks
- Bayesian Learning (0.67)
- Information Technology