Constructing classification trees using column generation
Firat, Murat, Crognier, Guillaume, Gabor, Adriana F., Zhang, Yingqian, Hurkens, C. A. J.
In classification problems, the goal is to decide the class membership of a set of observations, by using available information on features and class membership of a training data set. Decision trees are one of the most popular models for solving this problem, due to their effectiveness and high interpretability. In this work, we focus on constructing univariate binary decision trees of prespecified depth. In a univariate binary decision tree, each internal node contains a test regarding the value of one single feature of the data set, while the leaves contain the target classes. The problem of constructing (learning) a classification tree (CTCP), is the problem of finding a set of optimal tests (decision checks), such that the assignment of target classes to rows satisfies a certain criteria. A commonly encountered objective is accuracy, measured as the number of correct predictions in a training set. As the problem of learning optimal decision trees is an NPcomplete problem (Hyafil and Rivest 1976), heuristics such as CART (Breiman et al. 1984) and ID3 (Quinlan 1986) are widely used. These greedy algorithms build a tree recursively, starting from a single node.
Oct-15-2018