A Log-linear Gradient Descent Algorithm for Unbalanced Binary Classification using the All Pairs Squared Hinge Loss

Feb-21-2023–arXiv.org Artificial Intelligence

Binary classification is an important problem in many areas such as computer vision, natural language processing, and bioinformatics. Binary classification learning algorithms result in a function that outputs a real-valued predicted score (larger for more likely to be in the positive class). The prediction accuracy of learned binary classification models can be quantified using the zero-one loss, which corresponds to thresholding the predicted score at zero. Because it only considers one prediction threshold (the default), this evaluation metric can be problematic and/or misleading in some cases (data sets with extreme class imbalance, models with different false positive rates). A more comprehensive and fair evaluation method involves the Receiver Operating Characteristic (ROC) Curve, which involves plotting True Positive Rate versus False Positive Rate, for all thresholds of the predicted score [Egan and Egan, 1975]. The Area Under the ROC Curve (AUC) takes values between zero and one; constant/random/un-informed predictions yield AUC=0.5 and a set of perfect predictions would achieve AUC=1. It is therefore desirable to create learning algorithms that maximize AUC, and that criterion is often used for hyper-parameter selection. However, for gradient descent learning it is impossible to directly use the AUC since it is a piecewise constant function of the predicted values (the gradient is zero almost everywhere). Various authors have proposed to work around this issue by using convex relaxations of the Mann-Whitney statistic [Bamber, 1975], which involves a double sum over all pairs of positive and negative examples.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Feb-21-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Arizona (0.04)
  - New York > New York County
    - New York City (0.04)
  - California > Santa Clara County
    - Palo Alto (0.04)
- Europe > Portugal
  - Braga > Braga (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found