Class-Weighted Evaluation Metrics for Imbalanced Data Classification
Gupta, Akhilesh, Tatbul, Nesime, Marcus, Ryan, Zhou, Shengtian, Lee, Insup, Gottschlich, Justin
–arXiv.org Artificial Intelligence
Class distribution skews in imbalanced datasets may lead to models with prediction bias towards majority classes, making fair assessment of classifiers a challenging task. Balanced Accuracy is a popular metric used to evaluate a classifier's prediction performance under such scenarios. However, this metric falls short when classes vary in importance, especially when class importance is skewed differently from class cardinality distributions. In this paper, we propose a simple and general-purpose evaluation framework for imbalanced data classification that is sensitive to arbitrary skews in class cardinalities and importances. Experiments with several state-of-the-art classifiers tested on real-world datasets and benchmarks from two different domains show that our new framework is more effective than Balanced Accuracy - not only in evaluating and ranking model predictions, but also in training the models themselves. For a broad range of machine learning (ML) tasks, predictive modeling in the presence of imbalanced datasets - those with severe distribution skews - has been a longstanding problem (He & Garcia, 2009; Sun et al., 2009; He & Ma, 2013; Branco et al., 2016; Hilario et al., 2018; Johnson & Khoshgoftaar, 2019). Imbalanced training datasets lead to models with prediction bias towards majority classes, which in turn results in misclassification of the underrepresented ones.
arXiv.org Artificial Intelligence
Oct-12-2020
- Country:
- North America > United States (0.28)
- Genre:
- Research Report (0.82)
- Industry:
- Information Technology > Security & Privacy (0.46)
- Technology: