Goto

Collaborating Authors

 cost-sensitive logistic regression


Cost-Sensitive Logistic Regression for Imbalanced Classification

#artificialintelligence

Logistic regression does not support imbalanced classification directly. Instead, the training algorithm used to fit the logistic regression model must be modified to take the skewed distribution into account. This can be achieved by specifying a class weighting configuration that is used to influence the amount that logistic regression coefficients are updated during training. The weighting can penalize the model less for errors made on examples from the majority class and penalize the model more for errors made on examples from the minority class. The result is a version of logistic regression that performs better on imbalanced classification tasks, generally referred to as cost-sensitive or weighted logistic regression.


A Descriptive Study of Variable Discretization and Cost-Sensitive Logistic Regression on Imbalanced Credit Data

arXiv.org Machine Learning

Training classification models on imbalanced data sets tends to result in bias towards the majority class. In this paper, we demonstrate how the variable discretization and Cost-Sensitive Logistic Regression help mitigate this bias on an imbalanced credit scoring data set. 10-fold cross-validation is used as the evaluation method, and the performance measurements are ROC curves and the associated Area Under the Curve. The results show that good variable discretization and Cost-Sensitive Logistic Regression with the best class weight can reduce the model bias and/or variance. It is also shown that effective variable selection helps reduce the model variance. From the algorithm perspective, Cost-Sensitive Logistic Regression is beneficial for increasing the prediction ability of predictors even if they are not in their best forms and keeping the multivariate effect and univariate effect of predictors consistent. From the predictors perspective, the variable discretization performs slightly better than Cost-Sensitive Logistic Regression, provides more reasonable coefficient estimates for predictors which have nonlinear relationship against their empirical logit, and is robust to penalty weights of misclassifications of events and non-events determined by their proportions.