How to check/optimize cross validation with randomforest on imbalanced classes ?
Your data set is a bit small. The classic solution is to over-sample under-represented classes. I've been doing it routinely but on data sets with 50 million observations, where the class "fraud" (versus "non fraud") represented only 4 out of 10,000 observations. If you can get a much bigger data set, that would help. Also, with such as small, yet unbalanced data set, I would use less than 5 predictors.
May-27-2018, 23:00:24 GMT