Class Imbalance and Oversampling - Dr. Shahin Rostami
In this article we're going to introduce the problem of dataset class imbalance which often occurs in real-world classification problems. We'll then look at oversampling as a possible solution and provide a coded example as a demonstration on an imbalanced dataset. Let's assume we have a dataset where the data points are classified into two categories: Class A and Class B. In an ideal scenario the division of the data point classifications would be equal between the two categories, e.g.: With the above scenario we could sufficiently measure the performance of a classification model using classification accuracy. Let's say in this case that Class B represents the suspect categories, e.g. a weapon/disease/fraud has been detected. If a model scored a classification accuracy of 90%, we may decide we're happy.
Jan-4-2019, 21:28:38 GMT
- Technology: