This is my favorite evaluation metric and I tend to use this a lot in my classification projects. The F1 score is a number between 0 and 1 and is the harmonic mean of precision and recall. Let us start with a binary prediction problem. We are predicting if an asteroid will hit the earth or not. So if we say "No" for the whole training set.
The metrics that you choose to evaluate your machine learning algorithms are very important. Choice of metrics influences how the performance of machine learning algorithms is measured and compared. They influence how you weight the importance of different characteristics in the results and your ultimate choice of which algorithm to choose. In this post you will discover how to select and use different machine learning performance metrics in Python with scikit-learn. Metrics To Evaluate Machine Learning Algorithms in Python Photo by Ferrous Büller, some rights reserved.
How will we evaluate the performance of our Model? The gold standard here is the train-test-validation split. Frequently making a train-validation-test set, by sampling, we forgot about an implicit assumption -- Data is rarely ever IID(independently and identically distributed). In simple terms, our assumption that each data point is independent of each other and comes from the same distribution is faulty at best if not downright incorrect. For an internet company, a data point from 2007 is very different from a data point that comes in 2019.
This article was originally published in February 2016 and updated in August 2019. The idea of building machine learning models works on a constructive feedback principle. You build a model, get feedback from metrics, make improvements and continue until you achieve a desirable accuracy. Evaluation metrics explain the performance of a model. An important aspect of evaluation metrics is their capability to discriminate among model results. I have seen plenty of analysts and aspiring data scientists not even bothering to check how robust their model is. Once they are finished building a model, they hurriedly map predicted values on unseen data. This is an incorrect approach. Simply building a predictive model is not your motive. It's about creating and selecting a model which gives high accuracy on out of sample data.