Depth is More Powerful than Width with Prediction Concatenation in Deep Forest

Neural Information Processing Systems 

Random Forest (RF) is an ensemble learning algorithm proposed by Breiman [1] that constructs a large number of randomized decision trees individually and aggregates their predictions by naive averaging. Zhou and Feng [2] further propose Deep Forest (DF) algorithm with multi-layer feature transformation, which significantly outperforms random forest in various application fields. The prediction concatenation (PreConc) operation is crucial for the multi-layer feature transformation in deep forest, though little has been known about its theoretical property. In this paper, we analyze the influence of Preconc on the consistency of deep forest. Especially when the individual tree is inconsistent (as in practice, the individual tree is often set to be fully grown, i.e., there is only one sample at each leaf node), we find that the convergence rate of two-layer DF w.r.t. the number of trees M can reach O(1/M