Theoretical and experimental study of SMOTE: limitations and comparisons of rebalancing strategies

Sakho, Abdoulaye, Scornet, Erwan, Malherbe, Emmanuel

Feb-6-2024–arXiv.org Artificial Intelligence

Imbalanced data sets are a typical problem encountered practically in several applications (He and Garcia, 2009), such as fraud detection (Hassan and Abraham, 2016), medical diagnosis (Khalilia et al., 2011) and even churn detection (Nguyen and Duong, 2021). In presence of imbalanced data sets, most machine learning algorithms have a tendency to predict the majority class, therefore leading to biased predictions. Several strategies have been developed in order to handle this issue, as explained by Krawczyk (2016) and Ramyachitra and Manikandan (2014). All of these strategies can be split into two categories: the model-level approaches and the data-level approaches. Model-level approaches deal with this problem by acting directly on machine learning algorithms.

minority class, smote, theorem 3, (15 more...)

arXiv.org Artificial Intelligence

Feb-6-2024

arXiv.org PDF

Add feedback

Country:
- Europe
  - France > Île-de-France
    - Paris > Paris (0.04)
  - Croatia > Dubrovnik-Neretva County
    - Dubrovnik (0.04)
- Africa > South Africa
  - KwaZulu-Natal > Pietermaritzburg (0.04)

Genre:
- Research Report
  - New Finding (0.50)
  - Experimental Study (0.50)

Industry:
- Law Enforcement & Public Safety > Fraud (0.34)
- Health & Medicine > Diagnostic Medicine (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)