calibration
LoBoost: Fast Model-Native Local Conformal Prediction for Gradient-Boosted Trees
Santos, Vagner, Coscrato, Victor, Cabezas, Luben, Izbicki, Rafael, Ramos, Thiago
Gradient-boosted decision trees are among the strongest off-the-shelf predictors for tabular regression, but point predictions alone do not quantify uncertainty. Conformal prediction provides distribution-free marginal coverage, yet split conformal uses a single global residual quantile and can be poorly adaptive under heteroscedasticity. Methods that improve adaptivity typically fit auxiliary nuisance models or introduce additional data splits/partitions to learn the conformal score, increasing cost and reducing data efficiency. We propose LoBoost, a model-native local conformal method that reuses the fitted ensemble's leaf structure to define multiscale calibration groups. Each input is encoded by its sequence of visited leaves; at resolution level k, we group points by matching prefixes of leaf indices across the first k trees and calibrate residual quantiles within each group. LoBoost requires no retraining, auxiliary models, or extra splitting beyond the standard train/calibration split. Experiments show competitive interval quality, improved test MSE on most datasets, and large calibration speedups.
- South America > Brazil > São Paulo (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- North America > United States > Massachusetts (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
JUCAL: Jointly Calibrating Aleatoric and Epistemic Uncertainty in Classification Tasks
Heiss, Jakob, Lambrecht, Sören, Weissteiner, Jakob, Wutte, Hanna, Žurič, Žan, Teichmann, Josef, Yu, Bin
We study post-calibration uncertainty for trained ensembles of classifiers. Specifically, we consider both aleatoric (label noise) and epistemic (model) uncertainty. Among the most popular and widely used calibration methods in classification are temperature scaling (i.e., pool-then-calibrate) and conformal methods. However, the main shortcoming of these calibration methods is that they do not balance the proportion of aleatoric and epistemic uncertainty. Not balancing these uncertainties can severely misrepresent predictive uncertainty, leading to overconfident predictions in some input regions while being underconfident in others. To address this shortcoming, we present a simple but powerful calibration algorithm Joint Uncertainty Calibration (JUCAL) that jointly calibrates aleatoric and epistemic uncertainty. JUCAL jointly calibrates two constants to weight and scale epistemic and aleatoric uncertainties by optimizing the negative log-likelihood (NLL) on the validation/calibration dataset. JUCAL can be applied to any trained ensemble of classifiers (e.g., transformers, CNNs, or tree-based methods), with minimal computational overhead, without requiring access to the models' internal parameters. We experimentally evaluate JUCAL on various text classification tasks, for ensembles of varying sizes and with different ensembling strategies. Our experiments show that JUCAL significantly outperforms SOTA calibration methods across all considered classification tasks, reducing NLL and predictive set size by up to 15% and 20%, respectively. Interestingly, even applying JUCAL to an ensemble of size 5 can outperform temperature-scaled ensembles of size up to 50 in terms of NLL and predictive set size, resulting in up to 10 times smaller inference costs. Thus, we propose JUCAL as a new go-to method for calibrating ensembles in classification.
- North America > United States (0.46)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Government (0.45)
- Health & Medicine (0.45)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- (2 more...)
When to Trust the Cheap Check: Weak and Strong Verification for Reasoning
Kiyani, Shayan, Noorani, Sima, Pappas, George, Hassani, Hamed
Reasoning with LLMs increasingly unfolds inside a broader verification loop. Internally, systems use cheap checks, such as self-consistency or proxy rewards, which we call weak verification. Externally, users inspect outputs and steer the model through feedback until results are trustworthy, which we call strong verification. These signals differ sharply in cost and reliability: strong verification can establish trust but is resource-intensive, while weak verification is fast and scalable but noisy and imperfect. We formalize this tension through weak--strong verification policies, which decide when to accept or reject based on weak verification and when to defer to strong verification. We introduce metrics capturing incorrect acceptance, incorrect rejection, and strong-verification frequency. Over population, we show that optimal policies admit a two-threshold structure and that calibration and sharpness govern the value of weak verifiers. Building on this, we develop an online algorithm that provably controls acceptance and rejection errors without assumptions on the query stream, the language model, or the weak verifier.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia (0.04)
- North America > United States > Ohio (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Modeling & Simulation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > Germany > Saarland > Saarbrücken (0.04)
- Asia > Middle East > Jordan (0.04)