Scaling Up ROC-Optimizing Support Vector Machines

Bae, Gimun, Shin, Seung Jun

arXiv.org Machine Learning 

Binary classification is a fundamental problem in machine learning. Given a pair (X, Y), where X is a p-dimensional predictor and Y is a binary response taking values in { 1, 1}, the goal is to learn a decision function f of X that predicts Y by ˆ Y = sign{f(X)}. A canonical approach is to choose f that minimizes the classification error, or equivalently, maximizes the accuracy. For instance, the support vector machine (SVM; Vapnik, 1999) determines the decision function by maximizing the geometric margin, which effectively aligns with maximizing accuracy [Lin, 2002]. However, in imbalanced settings where one class is substantially underrepresented, accuracy can be a misleading measure of performance. Even a trivial classifier that always predicts the majority class can achieve high accuracy while completely failing to detect samples from the minor class. As an alternative, the receiver operating characteristic (ROC) curve is widely used to evaluate classifier performance under class imbalance. By definition, the ROC curve plots the true positive rate (TPR) against the false positive rate (FPR) to summarize classification performance, and the area under the ROC curve (AUC) serves as a popular scalar summary. A classifier with a larger AUC value is generally regarded as having better classification performance.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found