Classification from Pairwise Similarities/Dissimilarities and Unlabeled Data via Empirical Risk Minimization

Shimada, Takuya, Bao, Han, Sato, Issei, Sugiyama, Masashi

Apr-26-2019–arXiv.org Machine Learning

In supervised classification, we need a vast amount of labeled training data to train our classifiers. However, it is often not easy to obtain labels due to high labeling costs [Chapelle et al., 2010], privacy concern [Warner, 1965], social bias [Nederhof, 1985], and difficulty to label data. For such reasons, there is a situation in real-world classification problems, where pairwise similarities (i.e., pairs of samples in the same class) and pairwise dissimilarities (i.e., pairs of samples in different classes) might be easier to collect than fully labeled data. For example, in the task of protein function prediction [Klein et al., 2002], the knowledge about similarities/dissimilarities can be obtained as additional supervision, which can be found by experimental means. To handle such pairwise information, similar-unlabeled (SU) classification [Bao et al., 2018] has been proposed, where the classification risk is estimated in an unbiased fashion from only similar pairs and unlabeled data. Although they assumed that only similar pairs and unlabeled data are available, we may also obtain dissimilar pairs in practice. In this case, a method which can handle all of similarities/dissimilarities and unlabeled data is desirable. Semi-supervised clustering [Wagstaff et al., 2001] is one of the methods that can handle both similar and dissimilar pairs, where must-link pairs (i.e., similar pairs) and cannot-link pairs (i.e., dissimilar pairs) are used to obtain meaningful clusters.

artificial intelligence, classification, machine learning, (17 more...)

arXiv.org Machine Learning

Apr-26-2019

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)

Genre:
- Research Report (0.64)

Industry:
- Information Technology > Security & Privacy (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Unsupervised or Indirectly Supervised Learning (1.00)
  - Statistical Learning > Clustering (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found