Alternating optimization of decision trees, with application to learning sparse oblique trees

Miguel A. Carreira-Perpinan, Pooya Tavallali

Neural Information Processing Systems 

Learning a decision tree from data is a difficult optimizatio n problem. The most widespread algorithm in practice, dating to the 1980s, is ba sed on a greedy growth of the tree structure by recursively splitting nodes, and po ssibly pruning back the final tree. The parameters (decision function) of an interna l node are approximately estimated by minimizing an impurity measure. W e give an algorithm that, given an input tree (its structure and the parameter values a t its nodes), produces a new tree with the same or smaller structure but new paramete r values that provably lower or leave unchanged the misclassification error. T his can be applied to both axis-aligned and oblique trees and our experiments s how it consistently outperforms various other algorithms while being highly sc alable to large datasets and trees. Further, the same algorithm can handle a sparsity penalty, so it can learn sparse oblique trees, having a structure that is a subset of the original tree and few nonzero parameters. This combines the best of axis-a ligned and oblique trees: flexibility to model correlated data, low generaliza tion error, fast inference and interpretable nodes that involve only a few features in t heir decision.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found