Overcoming an Imbalanced Dataset using Oversampling.
How oversampling yielded great results for classifying cases of Sexual Harassment. When it comes to data science, sexual harassment is an imbalanced data problem, meaning there are few (known) instances of harassment in the entire dataset. An imbalanced problem is defined as a dataset which has disproportional class counts. Oversampling is one way to combat this by creating synthetic minority samples. SMOTE -- Synthetic Minority Over-sampling Technique -- is a common oversampling method widely used in machine learning with imbalanced high-dimensional datasets using Oversampling.
Jul-13-2020, 02:57:34 GMT
- Technology: