After doing the usual Feature Engineering, Selection, and of course, implementing a model and getting some output in forms of a probability or a class, the next step is to find out how effective is the model based on some metric using test datasets. Different performance metrics are used to evaluate different Machine Learning Algorithms. For now, we will be focusing on the ones used for Classification problems. We can use classification performance metrics such as Log-Loss, Accuracy, AUC(Area under Curve) etc. Another example of metric for evaluation of machine learning algorithms is precision, recall, which can be used for sorting algorithms primarily used by search engines. The metrics that you choose to evaluate your machine learning model is very important.

You are in a conference room, presenting your work on a classification problem. You demonstrate all the magic you performed with feature engineering, predictor selection, model selection, hyperparameter tuning, and ensembling. You conclude your presentation with the predicted probabilities and the ROC curve, and a fantastic AUC. You sit down confident in a job well done. And the manager says, "What am I supposed to do with all this probability and ROC stuff? I just want to know if I should do x or y."

This is the first of three articles about performance measures and graphs for binary learning models in Azure ML. Binary learning models are models which just predict one of two outcomes: positive or negative. These models are very well suited to drive decisions, such as whether to administer a patient a certain drug or to include a lead in a targeted marketing campaign. This first article lays the foundation by covering several statistical measures: accuracy, precision, recall and F1 score, These measures require a solid understanding of the two types of prediction errors which we will also cover: false positives and false negatives. In the second article we'll discuss the ROC curve and the related AUC measure.

A classifier is only as good as the metric used to evaluate it. If you choose the wrong metric to evaluate your models, you are likely to choose a poor model, or in the worst case, be misled about the expected performance of your model. Choosing an appropriate metric is challenging generally in applied machine learning, but is particularly difficult for imbalanced classification problems. Firstly, because most of the standard metrics that are widely used assume a balanced class distribution, and because typically not all classes, and therefore, not all prediction errors, are equal for imbalanced classification. In this tutorial, you will discover metrics that you can use for imbalanced classification. Tour of Evaluation Metrics for Imbalanced Classification Photo by Travis Wise, some rights reserved.

Classification accuracy is the total number of correct predictions divided by the total number of predictions made for a dataset. As a performance measure, accuracy is inappropriate for imbalanced classification problems. The main reason is that the overwhelming number of examples from the majority class (or classes) will overwhelm the number of examples in the minority class, meaning that even unskillful models can achieve accuracy scores of 90 percent, or 99 percent, depending on how severe the class imbalance happens to be. An alternative to using classification accuracy is to use precision and recall metrics. In this tutorial, you will discover how to calculate and develop an intuition for precision and recall for imbalanced classification.