The previous post concentrated on deciding if a categorical Machine Learning model should be released for production or not. This post concentrates on interpreting the scores of a regression model and the implication in using it in a decision management feature. Decision management solutions apply business rules written by humans and automatically apply them to the cases they are presented. Digital masters are more likely to include the output from a machine learning prediction into their decision management systems than those who are just starting their digital journey. If the last inspection was 30 days ago then put the battery on the "inspection due" list.

De Myttenaere, Arnaud, Golden, Boris, Grand, Bénédicte Le, Rossi, Fabrice

We study in this paper the consequences of using the Mean Absolute Percentage Error (MAPE) as a measure of quality for regression models. We prove the existence of an optimal MAPE model and we show the universal consistency of Empirical Risk Minimization based on the MAPE. We also show that finding the best model under the MAPE is equivalent to doing weighted Mean Absolute Error (MAE) regression, and we apply this weighting strategy to kernel regression. The behavior of the MAPE kernel regression is illustrated on simulated data.

Someone recently asked on the statistics Stack Exchange why the squared error is used in statistics. This is something I'd been wondering about myself recently, so I decided to take a crack at answering it. The post below is adapted from that answer. It's true that one could choose to use, say, the absolute error instead of the squared error. In fact, the absolute error is often closer to what you "care about" when making predictions from your model.

By the end of the 50th epoch, we have training accuracy of 100% while validation accuracy of 98.56%, which is impressive. Let's finally evaluate the performance of our classification model on the test set: Our model achieves an accuracy of 97.39% on the test set. Though it is slightly less than the training accuracy of 100%, it is still very good given the fact that we randomly chose the number of layers and the nodes. You can add more layers to the model with more nodes and see if you can get better results on the validation and test sets. In regression problem, the goal is to predict a continuous value. In this section, you will see how to solve a regression problem with TensorFlow 2.0 The dataset for this problem can be downloaded freely from this link.

The target's distribution is right skewed with some fairly high values compared to the mean: The Root Mean Squared Error (RMSE) or Mean Squared Error (MSE, which is basically the same as RMSE without the squared root) is the most popular regression metric. If there was a king/queen of regression metrics, this would have been it! Where y i is the prediction and yi the actual target value. In other words, you square all the errors (or residuals as they call them) per sample/row, then sum them, divide by the total number of observations and take the squared root to bring the metric back to the original space (or you don't in MSE). It is also one of the oldest regression metrics. Smaller errors (that are for example less than 1.) will have an even lower contribution to the overall error after being squared, whereas bigger errors will have much more weight after being squared. A large error in a given sample can have huge impact on the overall results and make an optimizer focus on reducing the error for that single sample, making the prediction for every other sample worse. This is because of the "squared" attribute, it makes it easily differentiable, something that gradient-based algorithms (like Stochastic Gradient Descent) can leverage.