Classification trees are one of the most popular types of classifiers, with ease of implementation and interpretation being among their attractive features. Despite the widespread use of classification trees, theoretical analysis of their performance is scarce. In this paper, we show that a new family of classification trees, called dyadic classification trees (DCTs), are near optimal (in a minimax sense) for a very broad range of classification problems.This demonstrates that other schemes (e.g., neural networks, support vector machines) cannot perform significantly better than DCTs in many cases. We also show that this near optimal performance isattained with linear (in the number of training data) complexity growing and pruning algorithms. Moreover, the performance of DCTs on benchmark datasets compares favorably to that of standard CART, which is generally more computationally intensive and which does not possess similar near optimality properties. Our analysis stems from theoretical resultson structural risk minimization, on which the pruning rule for DCTs is based.
However, both are equally important concepts of data science. Having said that, there are several dissimilarities between the two concepts also. In case of regression, as we all know the predicted outcome is a numeric variable and that too continuous. For a classification task, the predicted outcome is not numeric at all and represents categorical classes or factors i.e. the outcome variable in such a task has to be assuming limited number of values which may be binary in nature (dichotomous) or multinomial (having more than 2 classes). We in our analysis are motivated to work only on the'classification' scheme of tasks from a predictive analysis domain keeping our focus not on regression trees but only on classification trees, as the name suggests'Classification and Regression Trees'.
A decision tree is a supervised machine learning model used to predict a target by learning decision rules from features. As the name suggests, we can think of this model as breaking down our data by making a decision based on asking a series of questions. Let's consider the following example in which we use a decision tree to decide upon an activity on a particular day: Based on the features in our training set, the decision tree model learns a series of questions to infer the class labels of the samples. As we can see, decision trees are attractive models if we care about interpretability. Although the preceding figure illustrates the concept of a decision tree based on categorical targets (classification), the same concept applies if our targets are real numbers (regression).
Decision trees are a type of recursive partitioning algorithm. Decision trees are built up of two types of nodes: decision nodes, and leaves. The decision tree starts with a node called the root. If the root is a leaf then the decision tree is trivial or degenerate and the same classification is made for all data. For decision nodes we examine a single variable and move to another node based on the outcome of a comparison.