Kernel Density Estimation for Multiclass Quantification

Moreo, Alejandro, González, Pablo, del Coz, Juan José

arXiv.org Machine Learning 

Quantification (variously called learning to quantify or class prevalence estimation) is the area of supervised machine learning concerned with estimating the percentages of instances from a population (hereafter, a bag of examples) belonging to each of the classes of interest [González et al., 2017, Esuli et al., 2023]. Quantification finds applications in many disciplines, like the social sciences, epidemiology, or market research, in which the interest lies at the aggregate level, i.e., in which inferring characteristics of the single individual (e.g., via classification, or via regression) is of little concern since knowing group-level information is all we need. Despite the fact that binary quantification (i.e., the setting in which the classes of interest are positive vs. negative) has been, by far, the most studied scenario in the quantification literature [Card and Smith, 2018, Forman, 2008, Bella et al., 2010, Esuli and Sebastiani, 2015, Hassan et al., 2020, Moreo and Sebastiani, 2021], the truth is that many of the applications of quantification naturally arise in the multiclass regime, i.e., in cases in which there are more than two mutually exclusive classes. Examples of multiclass settings are ubiquitous, and may include the allocation of human resources to different departments in a company [Forman, 2005], the analysis of different phytoplankton species that could exist in a water sample [González et al., 2019], or the analysis of the various causes of death studied in verbal autopsies [King and Lu, 2008], to name a few. A more concrete example could consist of providing answers to questions like: "What is the percentage of tweets conveying positive, neutral, and negative opinions concerning a specific hashtag?"

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found