Fonseca, Pedro G., Lopes, Hugo D.

Binary classification is highly used in credit scoring in the estimation of probability of default. The validation of such predictive models is based both on rank ability, and also on calibration (i.e. how accurately the probabilities output by the model map to the observed probabilities). In this study we cover the current best practices regarding calibration for binary classification, and explore how different approaches yield different results on real world credit scoring data. The limitations of evaluating credit scoring models using only rank ability metrics are explored. A benchmark is run on 18 real world datasets, and results compared. The calibration techniques used are Platt Scaling and Isotonic Regression. Also, different machine learning models are used: Logistic Regression, Random Forest Classifiers, and Gradient Boosting Classifiers. Results show that when the dataset is treated as a time series, the use of re-calibration with Isotonic Regression is able to improve the long term calibration better than the alternative methods. Using re-calibration, the non-parametric models are able to outperform the Logistic Regression on Brier Score Loss.

A logistic regression model is said to be statistically significant only when the p-Values are less than the pre-determined statistical significance level, which is ideally 0.05. The p-value for each coefficient is represented as a probability Pr( z). We see here that both the coefficients have a very low p-value which means that both the coefficients are essential in computing the response variable. The stars corresponding to the p-values indicate the significance of that respective variable. Since in our model, both the p values have a 3 star, this indicates that both the variables are extremely significant in predicting the response variable.

Logistic Regression is the most widely used classification algorithm in machine learning. It is used in many real-world scenarios like spam detected, cancer detection, IRIS dataset, etc. Mostly it is used in binary classification problems. But it can also be used in multiclass classification. Logistic Regression predicts the probability that the given data point belongs to a certain class or not. In this article, I will be using the famous heart disease dataset from Kaggle.

In our previous chapters, we mainly discussed about the Linear Regression model where the target variable to be predicted is continous in nature and there is a linear relationship between the independent and target varables. But how to predict a discrete varaible based uopn the predictors which are linearly related with the target. In this case Logistic Regression comes to rescue. In this article, we will mainly focus on this predictive model and know the inner engineering of this model. So, What is Logistic Regression?