Depth is More Powerful than Width with Prediction Concatenation in Deep Forest

Neural Information Processing Systems 

Random Forest (RF) is an ensemble learning algorithm proposed by \citet{breiman2001random} that constructs a large number of randomized decision trees individually and aggregates their predictions by naive averaging. The prediction concatenation (PreConc) operation is crucial for the multi-layer feature transformation in deep forest, though little has been known about its theoretical property. In this paper, we analyze the influence of Preconc on the consistency of deep forest. Especially when the individual tree is inconsistent (as in practice, the individual tree is often set to be fully grown, i.e., there is only one sample at each leaf node), we find that the convergence rate of two-layer DF \textit{w.r.t.} the number of trees M can reach \mathcal{O}(1/M 2) under some mild conditions, while the convergence rate of RF is \mathcal{O}(1/M) . Therefore, with the help of PreConc, DF with deeper layer will be more powerful than the shallower layer.