A bagging SVM to learn from positive and unlabeled examples

Mordelet, Fantine, Vert, Jean-Philippe

arXiv.org Machine Learning 

In many applications, such as information retrieval or gene ranking, one is given a finite set of data of interest sharing a particular property, and wishes to find other data sharing the same property. In information retrieval, for example, the finite set can be a user query, or a set of documents known to belong to a specific category, and the goal is to scan a large database of documents to identify new documents related to the query or belonging to the same category. In gene ranking, the query is a finite list of genes known to have a given function or to be associated to a given disease, and the goal is to identify new genes sharing the same property (Aerts et al., 2006). In fact this setting is ubiquitous in many applications where identifying a data of interest is difficult or expensive, e.g., because human intervention is necessary or expensive experiments are needed, while unlabeled data can be easily collected. In such cases there is a clear opportunity to alleviate the burden and cost of interesting data identification with the help of machine learning techniques. More formally, let us assign a binary label to each possible data: positive ( 1) for data of interest, negative ( 1) for other data. Unlabeled data are data for which we do not know whether 1 they are interesting or not. Denoting X the set of data, we assume that the "query" is a finite set of data P {x

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found