Decision trees as partitioning machines to characterize their generalization properties

Neural Information Processing Systems 

Decision trees are popular machine learning models that are simple to build and easy to interpret. Even though algorithms to learn decision trees date back to almost 50 years, key properties affecting their generalization error are still weakly bounded. Hence, we revisit binary decision trees on real-valued features from the perspective of partitions of the data. We introduce the notion of partitioning function, and we relate it to the growth function and to the VC dimension. Using this new concept, we are able to find the exact VC dimension of decision stumps, which is given by the largest integer d such that 2\ell \ge \binom{d}{\floor{\frac{d}{2}}}, where \ell is the number of real-valued features.