When applying machine learning to real world classification problems two complications that often arise are imbalanced classes (one class occurs much more often than the other (Kubat, Holte, & Matwin 1998; Ezawa, Singh, & Norton 1996; Fawcett & Provost 1996)) and asymmetric misclassification costs (the cost of misclassifying an example from one class is much larger than the cost of misclassifying an example from the other class (Domingos 1999; Pazzani et al. 1997)).
Many different metrics are used in machine learning and data mining to build and evaluate models. However, there is no general theory of machine learning metrics, that could answer questions such as: When we simultaneously want to optimise two criteria, how can or should they be traded off? Some metrics are inherently independent of class and misclassification cost distributions, while other are not -- can this be made more precise? This paper provides a derivation of ROC space from first principles through 3D ROC space and the skew ratio, and redefines metrics in these dimensions. The paper demonstrates that the graphical depiction of machine learning metrics by means of ROC isometrics gives many useful insights into the characteristics of these metrics, and provides a foundation on which a theory of machine learning metrics can be built.
Decision Trees are well known for their training efficiency and their interpretable knowledge representation. They apply a greedy search and a divide-and-conquer approach to learn patterns. The greedy search is based on the evaluation criterion on the candidate splits at each node. Although research has been performed on various such criteria, there is no significant improvement from the classical split approaches introduced in the early decision tree literature. This paper presents a new evaluation rule to determine candidate splits in decision tree classifiers. The experiments show that this new evaluation rule reduces the size of the resulting tree, while maintaining the tree's accuracy.
Decision trees and random forests are well established models that not only offer good predictive performance, but also provide rich feature importance information. While practitioners often employ variable importance methods that rely on this impurity-based information, these methods remain poorly characterized from a theoretical perspective. We provide novel insights into the performance of these methods by deriving finite sample performance guarantees in a high-dimensional setting under various modeling assumptions. We further demonstrate the effectiveness of these impurity-based methods via an extensive set of simulations.
One challenge of neural or deep architectures is that it is difficult to determine what exactly is going on in the machine learning algorithm that makes a classifier decide how to classify inputs. This is a huge problem in deep learning: we can get fantastic classification accuracies, but we don't really know what criteria a classifier uses to make its classification decision. However, decision trees can present us with a graphical representation of how the classifier reaches its decision.