Minimax optimal rates for Mondrian trees and forests

Mourtada, Jaouad, Gaïffas, Stéphane, Scornet, Erwan

arXiv.org Machine Learning 

Originally introduced by [7], Random Forests (RF) are state-of-the-art classification and regression algorithms that proceed by averaging the forecasts of a number of randomized decision trees grown in parallel. Despite their widespread use and remarkable success in practical applications, the theoretical properties of such algorithms are still not fully understood. For an overview of theoretical results on random forests, see [5]. As a result of the complexity of the procedure, which combines sampling steps and feature selection, Breiman's original algorithm has proved difficult to analyze. Consequently, most theoretical studies focus on modified and stylized versions of Random Forests. Among these methods, Purely Random Forests (PRF) [6, 4, 3, 13, 2] that grow the individual trees independently of the sample, are particularly amenable to theoretical analysis. The consistency of such estimates (as well as other idealized RF procedures) was first obtained by [4], as a byproduct of the consistency of individual tree estimates. A recent line of research [25, 28, 18, 27] has sought to obtain some theoretical guarantees for RF variants that more closely resembled the algorithm used in practice. It should be noted, however, that most of these theoretical guarantees come at the price of assumptions either on the data structure or on the Random Forest algorithm itself, being thus still far from explaining the excellent empirical performance of Random Forests.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found