AITopics | calibration test

Calibration tests in multi-class classification: A unifying framework

Neural Information Processing SystemsDec-25-2025, 02:12:12 GMT

In safety-critical applications a probabilistic model is usually required to be calibrated, i.e., to capture the uncertainty of its predictions accurately. In multi-class classification, calibration of the most confident predictions only is often not sufficient. We propose and study calibration measures for multi-class classification that generalize existing measures such as the expected calibration error, the maximum calibration error, and the maximum mean calibration error. We propose and evaluate empirically different consistent and unbiased estimators for a specific class of measures based on matrix-valued kernels. Importantly, these estimators can be interpreted as test statistics associated with well-defined bounds and approximations of the p-value under the null hypothesis that the model is calibrated, significantly improving the interpretability of calibration measures, which otherwise lack any meaningful unit or scale.

calibration test, multi-class classification, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

Fast and Scalable Score-Based Kernel Calibration Tests

Glaser, Pierre, Widmann, David, Lindsten, Fredrik, Gretton, Arthur

arXiv.org Machine LearningOct-17-2025

We introduce the Kernel Calibration Conditional Stein Discrepancy test (KCCSD test), a non-parametric, kernel-based test for assessing the calibration of probabilistic models with well-defined scores. In contrast to previous methods, our test avoids the need for possibly expensive expectation approximations while providing control over its type-I error. We achieve these improvements by using a new family of kernels for score-based probabilities that can be estimated without probability density samples, and by using a conditional goodness-of-fit criterion for the KCCSD test's U-statistic. We demonstrate the properties of our test on various synthetic settings.

artificial intelligence, kernel, machine learning, (13 more...)

arXiv.org Machine Learning

2510.14711

Country:

Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
Europe > Sweden > Östergötland County > Linköping (0.04)
Europe > Sweden > Uppsala County > Uppsala (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A calibration test for evaluating set-based epistemic uncertainty representations

Jürgens, Mira, Mortier, Thomas, Hüllermeier, Eyke, Bengs, Viktor, Waegeman, Willem

arXiv.org Machine LearningFeb-22-2025

The accurate representation of epistemic uncertainty is a challenging yet essential task in machine learning. A widely used representation corresponds to convex sets of probabilistic predictors, also known as credal sets. One popular way of constructing these credal sets is via ensembling or specialized supervised learning methods, where the epistemic uncertainty can be quantified through measures such as the set size or the disagreement among members. In principle, these sets should contain the true data-generating distribution. As a necessary condition for this validity, we adopt the strongest notion of calibration as a proxy. Concretely, we propose a novel statistical test to determine whether there is a convex combination of the set's predictions that is calibrated in distribution. In contrast to previous methods, our framework allows the convex combination to be instance dependent, recognizing that different ensemble members may be better calibrated in different regions of the input space. Moreover, we learn this combination via proper scoring rules, which inherently optimize for calibration. Building on differentiable, kernel-based estimators of calibration errors, we introduce a nonparametric testing procedure and demonstrate the benefits of capturing instance-level variability on of synthetic and real-world experiments.

calibration, calibration error, convex combination, (12 more...)

arXiv.org Machine Learning

2502.16299

Country: Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Reviews: Calibration tests in multi-class classification: A unifying framework

Neural Information Processing SystemsJan-22-2025, 02:22:49 GMT

Summary The paper presents a novel unified theoretical framework and new measures for the calibration properties of multi-class classifiers, which generalize commonly used ones. Estimators for the proposed measures, based on vector-valued RKHS, are then proposed. The statistical properties of such estimators are theoretically characterized (including proofs), and statistical tests associated to the estimators are presented. Finally, the properties of the proposed estimators are exhaustively validated in supporting simulated experiments. Originality The proposed ideas are novel in the context of calibrated multi-class classification.

estimator, multi-class classification, unifying framework, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.92)

Add feedback

Reviews: Calibration tests in multi-class classification: A unifying framework

Neural Information Processing SystemsJan-22-2025, 02:22:39 GMT

The paper brings forward a new rigorous framework for the calibration of multi-class classification models. The reviewers found the contributions to be significant and original and the paper well written. The authors clarified the unclear points in their response.

calibration test, multi-class classification, unifying framework

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Calibration tests in multi-class classification: A unifying framework

Neural Information Processing SystemsOct-9-2024, 15:14:29 GMT

In safety-critical applications a probabilistic model is usually required to be calibrated, i.e., to capture the uncertainty of its predictions accurately. In multi-class classification, calibration of the most confident predictions only is often not sufficient. We propose and study calibration measures for multi-class classification that generalize existing measures such as the expected calibration error, the maximum calibration error, and the maximum mean calibration error. We propose and evaluate empirically different consistent and unbiased estimators for a specific class of measures based on matrix-valued kernels. Importantly, these estimators can be interpreted as test statistics associated with well-defined bounds and approximations of the p-value under the null hypothesis that the model is calibrated, significantly improving the interpretability of calibration measures, which otherwise lack any meaningful unit or scale.

calibration error, multi-class classification, unifying framework, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

EF-Calib: Spatiotemporal Calibration of Event- and Frame-Based Cameras Using Continuous-Time Trajectories

Wang, Shaoan, Xin, Zhanhua, Hu, Yaoqing, Li, Dongyue, Zhu, Mingzhu, Yu, Junzhi

arXiv.org Artificial IntelligenceMay-27-2024

Event camera, a bio-inspired asynchronous triggered camera, offers promising prospects for fusion with frame-based cameras owing to its low latency and high dynamic range. However, calibrating stereo vision systems that incorporate both event and frame-based cameras remains a significant challenge. In this letter, we present EF-Calib, a spatiotemporal calibration framework for event- and frame-based cameras using continuous-time trajectories. A novel calibration pattern applicable to both camera types and the corresponding event recognition algorithm is proposed. Leveraging the asynchronous nature of events, a derivable piece-wise B-spline to represent camera pose continuously is introduced, enabling calibration for intrinsic parameters, extrinsic parameters, and time offset, with analytical Jacobians provided. Various experiments are carried out to evaluate the calibration performance of EF-Calib, including calibration experiments for intrinsic parameters, extrinsic parameters, and time offset. Experimental results show that EF-Calib achieves the most accurate intrinsic parameters compared to current SOTA, the close accuracy of the extrinsic parameters compared to the frame-based results, and accurate time offset estimation. EF-Calib provides a convenient and accurate toolbox for calibrating the system that fuses events and frames. The code of this paper will also be open-sourced at: https://github.com/wsakobe/EF-Calib.

ef-calib, event camera, frame-based camera, (16 more...)

arXiv.org Artificial Intelligence

2405.17278

Country:

Asia > China > Fujian Province > Fuzhou (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(6 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Frame-Oriented Architecture (1.00)

Add feedback

Calibration tests in multi-class classification: A unifying framework

Widmann, David, Lindsten, Fredrik, Zachariah, Dave

Neural Information Processing SystemsMar-19-2020, 01:33:19 GMT

In safety-critical applications a probabilistic model is usually required to be calibrated, i.e., to capture the uncertainty of its predictions accurately. In multi-class classification, calibration of the most confident predictions only is often not sufficient. We propose and study calibration measures for multi-class classification that generalize existing measures such as the expected calibration error, the maximum calibration error, and the maximum mean calibration error. We propose and evaluate empirically different consistent and unbiased estimators for a specific class of measures based on matrix-valued kernels. Importantly, these estimators can be interpreted as test statistics associated with well-defined bounds and approximations of the p-value under the null hypothesis that the model is calibrated, significantly improving the interpretability of calibration measures, which otherwise lack any meaningful unit or scale.

calibration error, multi-class classification, unifying framework, (3 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback