Tree-based Machine Learning Models for Handling Imbalanced Datasets
Recently, I have been working on a binary classification problem with an imbalanced dataset, where the ratio of positive class to negative class is around 1:4. Imbalanced classification problems are so commonplace that data enthusiasts would encounter them sooner or later. In this post, I will be sharing three tree-based Machine Learning Models that can help handle imbalanced datasets. The dataset that I am going to use to illustrate the effectiveness of algorithms is the credit card fraud dataset from Kaggle. This is an extremely imbalanced dataset: out of 284,807 transactions, there are only 492 frauds. Following the convention, we label the fraud class samples as positive class and normal transactions, negative class.
May-4-2020, 09:58:32 GMT