What the F-measure doesn't measure: Features, Flaws, Fallacies and Fixes
Fortunately, there are better alternatives… What the F- ‐measure is! F-measure, There are several motivations for this choice of mean. In particular, the harmonic mean is commonly appropriate when averaging rates or frequencies, but there is also a settheoretic reason we will discuss later. Precision is the frequency with which retrieved documents or predictions are relevant or'correct', and is properly a form of Accuracy, also known as Positive Predictive Value (PPV) or True Positive Accuracy (TPA). F is intended to combine these into a single measure of search'effectiveness'. One of the problems with Recall, Precision, F-measure and Accuracy as used in Information Retrieval is that they are easily biased. To better understand the relationships between these measures it is useful to give their formulae in two forms, one form related to the raw counts, and one related to normalized frequencies (Equation 1 and Table 1). These statistics are all appropriate when there is one class of items that is of interest or relevance out of a larger set of N items or instances.
Mar-22-2015