We present a theoretical analysis of F -measures for binary, multiclass and multilabel classification.These performance measures are nonlinear, but in many scenarios they are pseudo-linear functions of the per-class false negative/false positive rate. Based on this observation, we present a general reduction of F - measure maximization to cost-sensitive classification with unknown costs. We then propose an algorithm with provable guarantees to obtain an approximately optimal classifier for the F -measure by solving a series of cost-sensitive classification problems.The strength of our analysis is to be valid on any dataset and any class of classifiers, extending the existing theoretical results on F -measures, which are asymptotic in nature. We present numerical experiments to illustrate the relative importance of cost asymmetry and thresholding when learning linear classifiers on various F -measure optimization tasks.
If you are reading this, then you probably tried to predict who will survive the Titanic shipwreck. This Kaggle competition is a canonical example of machine learning, and a right of passage for any aspiring data scientist. What if instead of predicting who will survive, you only had to predict how many will survive? Or, what if you had to predict the average age of survivors, or the sum of the fare that the survivors paid? There are many applications where classification predictions need to be aggregated.
We present a novel modulation level classification (MLC) method based on probability distribution distance functions. The proposed method uses modified Kuiper and Kolmogorov-Smirnov distances to achieve low computational complexity and outperforms the state of the art methods based on cumulants and goodness-of-fit tests. We derive the theoretical performance of the proposed MLC method and verify it via simulations. The best classification accuracy, under AWGN with SNR mismatch and phase jitter, is achieved with the proposed MLC method using Kuiper distances.
In this article, we investigate a family of classification algorithms defined by the principle of empirical risk minimization, in the high dimensional regime where the feature dimension $p$ and data number $n$ are both large and comparable. Based on recent advances in high dimensional statistics and random matrix theory, we provide under mixture data model a unified stochastic characterization of classifiers learned with different loss functions. Our results are instrumental to an in-depth understanding as well as practical improvements on this fundamental classification approach. As the main outcome, we demonstrate the existence of a universally optimal loss function which yields the best high dimensional performance at any given $n/p$ ratio.