On the Interconnections of Calibration, Quantification, and Classifier Accuracy Prediction under Dataset Shift

May-19-2025–arXiv.org Artificial Intelligence

Classifiers are often deployed in contexts in which the independent and identically distributed (IID) assumption is violated, i.e., in which the data used to train the model and the future data to be classified are not drawn from the same distribution. This situation is generally referred to as dataset shift in the machine learning literature [Storkey, 2009]. In this context, three problems have gained increased attention in the last years. Classifier calibration [Flach and Webb, 2016, Silva Filho et al., 2023] concerns the manipulation of the confidence scores produced by a classifier so that these effectively reflect the likelihood that a given instance is positive. Quantification [Gonz alez et al., 2017, Esuli et al., 2023] is instead concerned with estimating the prevalence of the classes of interest in an unlabelled set. Finally, classifier accuracy prediction aims at inferring how well a classifier will fare on unseen data [Elsahar and Gall e, 2019, Guillory et al., 2021]. Well-established procedures for attaining these three goals when the IID assumption holds are known and routinely used. For instance, calibrating the classifier's outputs can be attained by learning a calibration map (a function mapping classifier confidence scores into values reflecting the likelihood of the positive class) on held-out validation data [Platt, 2000, Zadrozny and Elkan, 2001a, Barlow and Brunk, 1972].

classifier, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

May-19-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Oregon > Multnomah County
      - Portland (0.04)
    - Massachusetts
      - Suffolk County > Boston (0.04)
      - Middlesex County > Cambridge (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Europe
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Lithuania > Vilnius County
    - Vilnius (0.04)
  - Italy
    - Tuscany > Pisa Province
      - Pisa (0.04)
    - Emilia-Romagna > Metropolitan City of Bologna
      - Bologna (0.04)
  - France > Auvergne-Rhône-Alpes
    - Isère > Grenoble (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - China > Hong Kong (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine (0.67)

Technology:
- Information Technology
  - Data Science > Data Mining (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Natural Language (1.00)
    - Machine Learning
      - Statistical Learning (1.00)
      - Performance Analysis > Accuracy (1.00)
      - Neural Networks (0.68)
      - Learning Graphical Models > Directed Networks
        Bayesian Learning (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found