Goto

Collaborating Authors

 prediction interval


LoBoost: Fast Model-Native Local Conformal Prediction for Gradient-Boosted Trees

Santos, Vagner, Coscrato, Victor, Cabezas, Luben, Izbicki, Rafael, Ramos, Thiago

arXiv.org Machine Learning

Gradient-boosted decision trees are among the strongest off-the-shelf predictors for tabular regression, but point predictions alone do not quantify uncertainty. Conformal prediction provides distribution-free marginal coverage, yet split conformal uses a single global residual quantile and can be poorly adaptive under heteroscedasticity. Methods that improve adaptivity typically fit auxiliary nuisance models or introduce additional data splits/partitions to learn the conformal score, increasing cost and reducing data efficiency. We propose LoBoost, a model-native local conformal method that reuses the fitted ensemble's leaf structure to define multiscale calibration groups. Each input is encoded by its sequence of visited leaves; at resolution level k, we group points by matching prefixes of leaf indices across the first k trees and calibrate residual quantiles within each group. LoBoost requires no retraining, auxiliary models, or extra splitting beyond the standard train/calibration split. Experiments show competitive interval quality, improved test MSE on most datasets, and large calibration speedups.


SparseDeepLearning: ANewFrameworkImmune toLocalTrapsandMiscalibration

Neural Information Processing Systems

Dn) 1 as n, which means the most posterior mass falls in the neighbourhood of true parameter. Remarkonthenotation: ν() is similar toν() defined in Section 2.1 of the main text. Thenotationsweusedinthis proof are the same as in the proof of Theorem 2.1. Theorem 2.2 implies that a faithful prediction interval can be constructed for the sparse neural network learned by the proposed algorithms. In practice, for a normal regression problem with noise N(0,σ2), to construct the prediction interval for a test pointx0, the terms σ2 and Σ = γ µ(β,x0)TH 1 γ µ(β,x0) in Theorem 2.2 need to be estimated from data.