A bagging SVM to learn from positive and unlabeled examples

Oct-5-2010–arXiv.org Machine Learning

In many applications, such as information retrieval or gene ranking, one is given a finite set of data of interest sharing a particular property, and wishes to find other data sharing the same property. In information retrieval, for example, the finite set can be a user query, or a set of documents known to belong to a specific category, and the goal is to scan a large database of documents to identify new documents related to the query or belonging to the same category. In gene ranking, the query is a finite list of genes known to have a given function or to be associated to a given disease, and the goal is to identify new genes sharing the same property (Aerts et al., 2006). In fact this setting is ubiquitous in many applications where identifying a data of interest is difficult or expensive, e.g., because human intervention is necessary or expensive experiments are needed, while unlabeled data can be easily collected. In such cases there is a clear opportunity to alleviate the burden and cost of interesting data identification with the help of machine learning techniques. More formally, let us assign a binary label to each possible data: positive ( 1) for data of interest, negative ( 1) for other data. Unlabeled data are data for which we do not know whether 1 they are interesting or not. Denoting X the set of data, we assume that the "query" is a finite set of data P {x

artificial intelligence, classifier, machine learning, (19 more...)

arXiv.org Machine Learning

Oct-5-2010

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:
- Research Report > New Finding (0.47)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Inductive Learning (0.96)
  - Statistical Learning (0.94)
  - Performance Analysis > Accuracy (0.46)
  - Learning Graphical Models > Directed Networks
    - Bayesian Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found