An RKHS Perspective on Tree Ensembles
Dagdoug, Mehdi, Dombry, Clement, Duchamps, Jean-Jil
Random Forests and Gradient Boosting are among the most effective algorithms for supervised learning on tabular data. Both belong to the class of tree-based ensemble methods, where predictions are obtained by aggregating many randomized regression trees. In this paper, we develop a theoretical framework for analyzing such methods through Reproducing Kernel Hilbert Spaces (RKHSs) constructed on tree ensembles--more precisely, on the random partitions generated by randomized regression trees. We establish fundamental analytical properties of the resulting Random Forest kernel, including boundedness, continuity, and universality, and show that a Random Forest predictor can be characterized as the unique minimizer of a penalized empirical risk functional in this RKHS, providing a variational interpretation of ensemble learning. We further extend this perspective to the continuous-time formulation of Gradient Boosting introduced by Dombry and Duchamps (2024a,b), and demonstrate that it corresponds to a gradient flow on a Hilbert manifold induced by the Random Forest RKHS. A key feature of this framework is that both the kernel and the RKHS geometry are data-dependent, offering a theoretical explanation for the strong empirical performance of tree-based ensembles. Finally, we illustrate the practical potential of this approach by introducing a kernel principal component analysis built on the Random Forest kernel, which enhances the interpretability of ensemble models, as well as GVI, a new geometric variable importance criterion.
Dec-2-2025
- Country:
- Europe
- France (0.04)
- United Kingdom > England
- West Sussex (0.04)
- North America
- Canada > Quebec
- Montreal (0.04)
- United States
- California
- Alameda County > Berkeley (0.04)
- San Francisco County > San Francisco (0.14)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New York > New York County
- New York City (0.14)
- California
- Canada > Quebec
- Europe
- Genre:
- Research Report > New Finding (0.45)
- Industry:
- Health & Medicine > Therapeutic Area (0.67)
- Technology: