Improving GBDT Performance on Imbalanced Datasets: An Empirical Study of Class-Balanced Loss Functions
Luo, Jiaqi, Yuan, Yuan, Xu, Shixin
–arXiv.org Artificial Intelligence
However, like many machine learning algorithms, GBDT faces challenges when dealing with imbalanced datasets. Class imbalance is a persistent issue in many real-world applications, such as fraud detection [5], medical diagnosis [6], and fault diagnosis [7]. It poses significant challenges to machine learning algorithms, leading to poor performance, particularly in predicting the minority class. Various strategies have been prompted to address this challenge, including sampling techniques and algorithm modifications [8, 9]. While these methods have shown promise, the exploration of class-balanced losses, particularly in multi-label classification, has received comparatively little attention. This paper presents the first comprehensive study on adapting classbalanced loss functions to GBDT algorithms across various tabular classi-2 fication tasks, including binary, multi-class, and multi-label classification. We conduct extensive experiments on multiple datasets spanning diverse classification tasks, rigorously evaluating the performance of class-balanced losses within different GBDT models. Our thorough results demonstrate the effectiveness of these loss functions in mitigating class imbalance issues in tree-based ensemble methods.
arXiv.org Artificial Intelligence
Jul-19-2024
- Country:
- North America > Canada
- Asia > China
- Jiangsu Province (0.04)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Law Enforcement & Public Safety > Fraud (0.34)
- Health & Medicine
- Therapeutic Area (0.47)
- Diagnostic Medicine (0.34)
- Technology: