Decision trees are a type of recursive partitioning algorithm. Decision trees are built up of two types of nodes: decision nodes, and leaves. The decision tree starts with a node called the root. If the root is a leaf then the decision tree is trivial or degenerate and the same classification is made for all data. For decision nodes we examine a single variable and move to another node based on the outcome of a comparison.
The decision tree in the figure is just one of many decision tree structures you could create to solve the marketing problem. The task of finding the optimal decision tree is an intractable problem. For those of you who have taken an analysis of algorithms course, you no doubt recognize this term. For those of you who haven't had this pleasure (he says, gritting his teeth), essentially what this means is that as the amount of test data used to train the decision tree grows, the amount of time it takes to do so grows as well--exponentially. While it may be nearly impossible to find the smallest (or more fittingly, the shallowest) decision tree in a respectable amount of time, it is possible to find a decision tree that is "small enough" using special heuristics.
A decision tree is a supervised machine learning model used to predict a target by learning decision rules from features. As the name suggests, we can think of this model as breaking down our data by making a decision based on asking a series of questions. Let's consider the following example in which we use a decision tree to decide upon an activity on a particular day: Based on the features in our training set, the decision tree model learns a series of questions to infer the class labels of the samples. As we can see, decision trees are attractive models if we care about interpretability. Although the preceding figure illustrates the concept of a decision tree based on categorical targets (classification), the same concept applies if our targets are real numbers (regression).
When discussing classifiers, decision trees are often thought of as easily interpretable models when compared to numerous more complex classifiers, especially those of the blackbox variety. And this is generally true. This is especially true of rather comparatively simple models created from simple data. This is much-less true of complex decision trees crafted from large amounts of (high-dimensional) data. Even otherwise straightforward decision trees which are of great depth and/or breadth, consisting of heavy branching, can be difficult to trace.