In Machine Learning Performance Metrics numbers have an important story to tell. They rely on you to give them a voice. Regardless of you are a non-technical person in sales, marketing or operations. Or whether you belong to a technical background such as data science, engineering or development. It is equally important for everyone to understand how performance metrics work for machine learning.

Classifier metrics are metrics used to evaluate the performance of machine learning classifiers -- models that put each training example into one of several discrete categories. Confusion Matrix is a matrix used to indicate a classifier's predictions on labels. It contains four cells, each corresponding to one combination of a predicted true or false and an actual true or false. Many classifier metrics are based on the confusion matrix, so it's helpful to keep an image of it stored in your mind. Sensitivity/Recall is the number of positives that were accurately predicted.

These include: true positives, false positives (type 1 error), true negatives, and false negatives (type 2 error). There are many metrics for determining model performance for regression problems, but the most commonly used metric is known as the mean square error (MSE), or variation called the root mean square error (RMSE), which is calculated by taking the square root of the mean squared error. Recall the different results from a binary classifier, which are true positives, true negatives, false positives, and false negatives. Precision (positive predictive value) is the ratio of true positives to the total amount of positive predictions made (i.e., true or false).

Welcome to the fourth article in a five-part series about machine learning. In this article, we will take a deeper dive into model evaluation and performance metrics, and potential prediction-related errors that one may encounter. Before digging deeper into model performance and error types, we must first discuss the concept of residuals and errors for regression, positive and negative classifications for classification problems, and in-sample versus out-of-sample measurements. Any reference to models, metrics, or errors computed with respect to the data used to train, validate, or tune a predictive model (i.e., data you have) is called in-sample. Conversely, reference to test data metrics and errors, or new data in general is called out-of-sample (i.e., data you don't have).