This article was published as a part of the Data Science Blogathon. So you have successfully built your classification model. What should you do now? How do you evaluate the performance of the model that is how good the model is in predicting the outcome. To answer these questions, let's understand the metrics used in evaluating a classification model using a simple case study.
The evaluation metric is used to measure the performance of a machine learning model. A correct choice of an evaluation metric is very essential for a model. This article will cover all the metrics used in classification and regression machine learning models. For a classification machine learning algorithm, the output of the model can be a target class label or probability score. The different evaluation metric is used for these two approaches.
You've built your machine learning model – so what's next? You need to evaluate it and validate how good (or bad) it is, so you can then decide on whether to implement it. That's where the AUC-ROC curve comes in. The name might be a mouthful, but it is just saying that we are calculating the "Area Under the Curve" (AUC) of "Receiver Characteristic Operator" (ROC). I have been in your shoes.
Many fields use the ROC curve and the PR curve as standard evaluations of binary classification methods. Analysis of ROC and PR, however, often gives misleading and inflated performance evaluations, especially with an imbalanced ground truth. Here, we demonstrate the problems with ROC and PR analysis through simulations, and propose the MCC-F1 curve to address these drawbacks. The MCC-F1 curve combines two informative single-threshold metrics, MCC and the F1 score. The MCC-F1 curve more clearly differentiates good and bad classifiers, even with imbalanced ground truths. We also introduce the MCC-F1 metric, which provides a single value that integrates many aspects of classifier performance across the whole range of classification thresholds. Finally, we provide an R package that plots MCC-F1 curves and calculates related metrics.