Universal guarantees for decision tree induction via a higher-order splitting criterion

Neural Information Processing Systems 

We propose a simple extension of {\sl top-down decision tree learning heuristics} such as ID3, C4.5, and CART. Our algorithm achieves provable guarantees for all target functions $f: \{-1,1\}^n \to \{-1,1\}$ with respect to the uniform distribution, circumventing impossibility results showing that existing heuristics fare poorly even for simple target functions. The crux of our extension is a new splitting criterion that takes into account the correlations between $f$ and {\sl small subsets} of its attributes.