A plug-in approach to maximising precision at the top and recall at the top

Tasche, Dirk

arXiv.org Machine Learning 

Information retrieval and binary classification can be considered equivalent problems in principle. Information retrieval means to mark documents in a set of candidate documents as relevant or non-relevant for some question, on the basis of the properties of the documents. For binary classification, the problem is to distinguish between the'positive' and'negative' instances from a dataset, based on the features of the instances. Hence, from an abstract point of view, information retrieval is a special case of binary classification, with the documents being instances, the document properties being features and'relevant' being translated as'positive'. In practice, however, the general concepts from binary classification are not always helpful for information retrieval applications. The fact that often the proportion of relevant documents in a set of documents subject to a search is small or even very small is only one of the reasons for information retrieval to be considered a field of research for its own. As a consequence, some performance measures for information retrieval methods differ from those in use for binary classifiers or are called by different names. Precision and recall are possibly the most popular performance measures(see Chapter 8 of Manning et al., 2008, for a list of performance measures) for information retrieval methods: - Precision is the proportion of documents (instances) that are truly relevant (positive) among those documents which have been predicted relevant (positive). The term precision is also commonly used (with the same meaning) in binary classification.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found