Supplementary Material for the Paper " Joints in Random Forests "

Neural Information Processing Systems 

Let f be a DT classifier and p(Y | x) be a corresponding GeDT classifier, where each leaf in GeDT is class-factorised, i.e. of the form p(Y)p(X), and where p(Y) has been estimated in the maximum-likelihood sense. Then f(x) = p(Y | x), provided that p(x) > 0. Proof. Recall that the leaves in the GeDT are in one-to-one correspondence with the leaf cells A of the DT, and that the support of any leaf is given by its corresponding A A. Let v Since the GeDT is deterministic, it has at most one non-zero child. Before proving Theorem 2 we need to introduce some background. This theorem extends consistency results for collections of partitions of the state space X, as discussed by Lugosi and Nobel [12].