A Log-linear Gradient Descent Algorithm for Unbalanced Binary Classification using the All Pairs Squared Hinge Loss

Rust, Kyle R., Hocking, Toby D.

arXiv.org Artificial Intelligence 

Binary classification is an important problem in many areas such as computer vision, natural language processing, and bioinformatics. Binary classification learning algorithms result in a function that outputs a real-valued predicted score (larger for more likely to be in the positive class). The prediction accuracy of learned binary classification models can be quantified using the zero-one loss, which corresponds to thresholding the predicted score at zero. Because it only considers one prediction threshold (the default), this evaluation metric can be problematic and/or misleading in some cases (data sets with extreme class imbalance, models with different false positive rates). A more comprehensive and fair evaluation method involves the Receiver Operating Characteristic (ROC) Curve, which involves plotting True Positive Rate versus False Positive Rate, for all thresholds of the predicted score [Egan and Egan, 1975]. The Area Under the ROC Curve (AUC) takes values between zero and one; constant/random/un-informed predictions yield AUC=0.5 and a set of perfect predictions would achieve AUC=1. It is therefore desirable to create learning algorithms that maximize AUC, and that criterion is often used for hyper-parameter selection. However, for gradient descent learning it is impossible to directly use the AUC since it is a piecewise constant function of the predicted values (the gradient is zero almost everywhere). Various authors have proposed to work around this issue by using convex relaxations of the Mann-Whitney statistic [Bamber, 1975], which involves a double sum over all pairs of positive and negative examples.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found