Classification from Positive, Unlabeled and Biased Negative Data

Hsieh, Yu-Guan, Niu, Gang, Sugiyama, Masashi

Oct-1-2018–arXiv.org Machine Learning

In conventional binary classification, examples are labeled as either positive (P) or negative (N), and we train a classifier on these labeled examples. On the contrary, positive-unlabeled (PU) learning addresses the problem of learning a classifier from P and unlabeled (U) data, without need of explicitly identifying N data (Elkan & Noto, 2008; Ward et al., 2009). PU learning finds its usefulness in many real-world problems. For example, in one-class remote sensing classification (Li et al., 2011), we seek to extract a specific land-cover class from an image. While it is easy to label examples of this specific land-cover class of interest, examples not belonging to this class are too diverse to be exhaustively annotated. The same problem arises in text classification, as it is difficult or even impossible to compile a set of N samples that provides a comprehensive characterization of everything that is not in the P class (Liu et al., 2003; Fung et al., 2006). Besides, PU learning has also been applied to other domains such as outlier detection (Hido et al., 2008; Scott & Blanchard, 2009), medical diagnosis (Zuluaga et al., 2011), or time series classification (Nguyen et al., 2011). By carefully examining the above examples, we find out that the most difficult step is often to collect a fully representative N set, whereas only labeling a small portion of all possible N data is relatively easy. Therefore, in this paper, we propose to study the problem of learning from P, U and biased N (bN) data, which we name PUbN learning hereinafter.

health & medicine, learning, neural network, (19 more...)

arXiv.org Machine Learning

Oct-1-2018

arXiv.org PDF

Add feedback

Country:
- North America > United States > Massachusetts (0.14)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Education > Educational Setting
  - Online (0.54)
- Health & Medicine > Diagnostic Medicine (0.66)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (0.93)
  - Performance Analysis > Accuracy (0.95)
  - Statistical Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found