Goto

Collaborating Authors

 check optimize cross validation


How to check/optimize cross validation with randomforest on imbalanced classes ?

@machinelearnbot

Your data set is a bit small. The classic solution is to over-sample under-represented classes. I've been doing it routinely but on data sets with 50 million observations, where the class "fraud" (versus "non fraud") represented only 4 out of 10,000 observations. If you can get a much bigger data set, that would help. Also, with such as small, yet unbalanced data set, I would use less than 5 predictors.