There are many metrics to measure the performance of your model depending on the type of machine learning you are looking to conduct. In this article, we take a look at performance measures for classification and regression models and discuss which is better optimized. Sometimes the metric to look at will vary according to the problem that is initially being solved. The True Positive Rate also called Recall is the go-to performance measure in binary/non-binary classification problems. Most if not all the time, we are only interested in correctly predicting one class.

In supervised learning, algorithms learn from labeled data. After understanding the data, the algorithm determines which label should be given to new data by associating patterns to the unlabeled new data. Supervised learning can be divided into two categories: classification and regression. Some examples of classification include spam detection, churn prediction, sentiment analysis, dog breed detection and so on. Some examples of regression include house price prediction, stock price prediction, height-weight prediction and so on.

Choosing the right metric is crucial while evaluating machine learning (ML) models. Various metrics are proposed to evaluate ML models in different applications, and I thought it may be helpful to provide a summary of popular metrics in a here, for better understanding of each metric and the applications they can be used for. In some applications looking at a single metric may not give you the whole picture of the problem you are solving, and you may want to use a subset of the metrics discussed in this post to have a concrete evaluation of your models. Here, I provide a summary of 20 metrics used for evaluating machine learning models. There is no need to mention that there are various other metrics used in some applications (FDR, FOR, hit@k, etc.), which I am skipping here.