Localized Uncertainty Quantification in Random Forests via Proximities

Rhodes, Jake S., Brown, Scott D., Wilkinson, J. Riley

arXiv.org Machine Learning 

Abstract--In machine learning, uncertainty quantification helps assess the reliability of model predictions, which is important in high-stakes scenarios. Traditional approaches often emphasize predictive accuracy, but there is a growing focus on incorporating uncertainty measures. While current methods often rely on quantile regression or Monte Carlo techniques, we propose a new approach using naturally occurring test sets and similarity measures (proximities) typically viewed as byproducts of random forests. Specifically, we form localized distributions of OOB errors around nearby points, defined using the proximities, to create prediction intervals for regression and trust scores for classification. By varying the number of nearby points, our intervals can be adjusted to achieve the desired coverage while retaining the flexibility that reflects the certainty of individual predictions. For classification, excluding points identified as unclassifiable by our method generally enhances the accuracy of the model and provides higher accuracy-rejection AUC scores than competing methods. Although traditional machine learning models usually provide point estimates, there is growing recognition of the need to incorporate uncertainty to support more informed decisions [1]. By quantifying uncertainty, users can assess the reliability of model outputs and better interpret results, especially for out-of-distribution samples through calibrated confidence estimates.