Data Mining and Machine Learning -- Credibility of the trained model
When we train a machine learning algorithm we have to be credible and prove that it is better than another model. In this article we will see the techniques to do this. How do I split the data of my set in order to make correct estimates of the performance of my ML algorithm? We will see it also with respect to the confidence limits. We will see how to evaluate two classifiers. Since in general there is no better technique than another, you have to compare the different approaches. Depending on the problem under examination, we have preferable approaches compared to others. In general we have no guarantee that one technique works better than others if we do not use statistical tests. For example, if we have a binary problem, our choice will probably fall on a binary classifier. Another important aspect is that, in many contexts, not all errors weigh in the same way, for example in the medical field saying that a sick patient is well means that the error has a higher cost than the opposite error. We initially assume constant costs for each type of error, which means not considering the costs. We will also see the cost-sensitive performance evaluation in our scheme (model). It is also possible to make a numeric evaluation, for example, to predict the performance of our processor. While the classification works only with labels or probabilities, it is also possible to evaluate a numerical value. Although in theory I could do it, if I evaluate the performance on the errors of the training set I make a mistake. Evaluation of the training set error is not a good indicator of performance on future data.
Oct-11-2020, 15:01:40 GMT
- Country:
- Oceania > New Zealand > North Island > Waikato (0.04)
- Industry:
- Health & Medicine (0.48)
- Technology: