Random Forests as Statistical Procedures: Design, Variance, and Dependence

Mar-3-2026–arXiv.org Machine Learning

We develop a finite-sample, design-based theory for random forests in which each tree is a randomized conditional predictor acting on fixed covariates and the forest is their Monte Carlo average. An exact variance identity separates Monte Carlo error from a covariance floor that persists under infinite aggregation. The floor arises through two mechanisms: observation reuse, where the same training outcomes receive weight across multiple trees, and partition alignment, where independently generated trees discover similar conditional prediction rules. We prove the floor is strictly positive under minimal conditions and show that alignment persists even when sample splitting eliminates observation overlap entirely. We introduce procedure-aligned synthetic resampling (PASR) to estimate the covariance floor, decomposing the total prediction uncertainty of a deployed forest into interpretable components. For continuous outcomes, resulting prediction intervals achieve nominal coverage with a theoretically guaranteed conservative bias direction. For classification forests, the PASR estimator is asymptotically unbiased, providing the first pointwise confidence intervals for predicted conditional probabilities from a deployed forest. Nominal coverage is maintained across a range of design configurations for both outcome types, including high-dimensional settings. The underlying theory extends to any tree-based ensemble with an exchangeable tree-generating mechanism.

artificial intelligence, machine learning, prediction, (19 more...)

arXiv.org Machine Learning

Mar-3-2026

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - North Carolina > Forsyth County
    - Winston-Salem (0.04)
  - California > Alameda County
    - Berkeley (0.04)

Genre:
- Research Report > Experimental Study (0.67)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Decision Tree Learning (0.71)
  - Ensemble Learning (0.71)
  - Statistical Learning (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found