Information-Theoretic Representation Learning for Positive-Unlabeled Classification
Sakai, Tomoya, Niu, Gang, Sugiyama, Masashi
In real-world applications, it is conceivable that only positive and unlabeled (PU) data are available for training a classifier. For instance, in land-cover image classification, images of urban regions can be easily labeled, while images of non-urban regions are difficult to annotate due to high diversity of non-urban regions containing, e.g., forest, seas, grasses, and soil (Li et al., 2011). To cope with such situations, PU classification has been actively studied (Letouzey et al., 2000; Elkan and Noto, 2008; du Plessis et al., 2015), and the state-of-the-art method allows us to systematically train deep neural networks only from PU data (Kiryo et al., 2017). However, existing PU classification methods typically require an estimate of the class-prior probability, and their performance is sensitive to the quality of class-prior estimation (Kiryo et al., 2017). Although various class-prior estimation methods from PU data have been proposed so far (du Plessis and Sugiyama, 2014; Ramaswamy et al., 2016; Jain et al., 2016; du Plessis et al., 2017; Northcutt et al., 2017), accurate estimation of the class-prior is still highly challenging particularly for high-dimensional data.
Feb-12-2018
- Country:
- Europe (0.46)
- North America > United States (0.28)
- Asia > Japan (0.28)
- Genre:
- Research Report (1.00)