Excluding variables from a logistic regression model based on correlation
To start with, usually, the cases where Logistic Regression is performed is when the cases of interest are small in no ( 5%) - like in your case, small size of frauds. The intention is to identify patterns to be able to identify fraud before their fraud in future. Second, when you say variables are correlated, they also generally have a similar information in terms of business sense. For eg., Price variables & Discount are two correlated variables, yet different transformations of the same kind of data. So, it makes sense to keep just one! Similarly, in your case, it's best to isolate such cases.
Nov-2-2016, 22:00:38 GMT