Alternate Estimation of a Classifier and the Class-Prior from Positive and Unlabeled Data

Kato, Masahiro, Xu, Liyuan, Niu, Gang, Sugiyama, Masashi

arXiv.org Machine Learning 

We consider the problem of learning a binary classifier only from positive data and unlabeled data (PU learning). This problem arises in various practical situations, such as information retrieval and outlier detection (Elkan and Noto, 2008; Ward et al., 2009; Scott and Blanchard, 2009; Blanchard et al., 2010; Li et al., 2009; Nguyen et al., 2011). One of the theoretical milestones of PU learning is Elkan and Noto (2008) and there are subsequent researches called unbiased PU learning (du Plessis and Sugiyama, 2014; du Plessis et al., 2015), where the classification risk is estimated in an unbiased manner only from PU data. We consider the case-control scenario (Ward et al., 2009; Elkan and Noto, 2008), where positive data are obtained separately from unlabeled data and unlabeled data is sampled from the whole population. Under this setting, the true class-prior π p(y 1) in unlabeled data is needed for the formulation of unbiased PU learning.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found