Inference with Mondrian Random Forests

Cattaneo, Matias D., Klusowski, Jason M., Underwood, William G.

arXiv.org Machine Learning 

Random forests, first introduced by Breiman (2001), are a workhorse in modern machine learning for classification and regression tasks. Their desirable traits include computational efficiency (via parallelization and greedy heuristics) in big data settings, simplicity of configuration and amenability to tuning parameter selection, ability to adapt to latent structure in high-dimensional data sets, and flexibility in handling mixed data types. Random forests have achieved great empirical successes in many fields of study, including healthcare, finance, online commerce, text analysis, bioinformatics, image classification, and ecology. Since Breiman introduced random forests over twenty years ago, the study of their statistical properties remains an active area of research: see Scornet et al. (2015), Chi et al. (2022), Klusowski and Tian (2023), and references therein, for a sample of recent developments. Many fundamental questions about Breiman's random forests remain unanswered, owing in part to the subtle ingredients present in the estimation procedure which make standard analytical tools ineffective. These technical difficulties stem from the way the constituent trees greedily partition the covariate space, utilizing both the covariate and response data. This creates complicated dependencies on the data that are often exceedingly hard to untangle without overly stringent assumptions, thereby hampering theoretical progress.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found