AITopics | calibration metric

Collaborating Authors

calibration metric

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

KSP: Kolmogorov-Smirnov metric-based Post-Hoc Calibration for Survival Analysis

Neural Information Processing SystemsJun-15-2026, 15:48:35 GMT

We propose a new calibration method for survival models based on the Kolmogorov-Smirnov (KS) metric. Existing approaches--including conformal prediction, D-calibration, and Kaplan-Meier (KM)-based methods--often rely on heuristic binning or additional nonparametric estimators, which undermine their adaptability to continuous-time settings and complex model outputs. To address these limitations, we introduce a streamlined KS metric-based post-processing framework (KSP) that calibrates survival predictions without relying on discretization or KM estimation. This design enhances flexibility and broad applicability. We conduct extensive experiments on diverse real-world datasets using a variety of survival models. Empirical results demonstrate that our method consistently improves calibration performance over existing methods while maintaining high predictive accuracy. We also provide a theoretical analysis of the KS metric and discuss extensions to in-processing settings.

artificial intelligence, machine learning, non-calibrated 0, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.67)
North America > Canada (0.45)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Divide et Calibra: Multiclass Local Calibration via Vector Quantization

Barbera, Cesare, Perini, Lorenzo, De Toni, Giovanni, Passerini, Andrea, Pugnana, Andrea

arXiv.org Machine LearningMay-21-2026

Accurate and well-calibrated Machine Learning (ML) models are mandatory in high-stakes settings, yet effective multiclass calibration remains challenging: global approaches assume calibration errors are homogeneous across the latent space, while local methods often rely on latent-space dimensionality reduction, which leads to information loss. To address these issues, we propose a compositional approach to multiclass calibration, where region-specific calibration maps are constructed from shared codeword-dependent factors. We instantiate this idea via Vector Quantization (VQ), which induces a structured partition of the representation space, and an indexed parameterization of Dirichlet concentrations that enables parameter sharing across regions. Our approach learns heterogeneous calibration maps that generalize well even to sparse regions of the latent space. Experiments on benchmark datasets show significant improvements in local calibration while maintaining competitive global calibration and predictive performance.

calibration, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2605.2106

Country: Europe (0.67)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
(2 more...)

Add feedback

Unified Approach for Weakly Supervised Multicalibration

Futami, Futoshi, Ishida, Takashi

arXiv.org Machine LearningMay-12-2026

Multicalibration requires predicted scores to agree with label probabilities across rich families of subgroups and score-dependent tests, but existing methods require clean input-label pairs for evaluation and post-processing. This assumption fails in weakly supervised learning (WSL) regimes -- including positive-unlabeled, unlabeled-unlabeled, and positive-confidence learning -- where clean labels are costly or unavailable even though reliable uncertainty estimates may be crucial. We address this gap by developing estimators of multicalibration error and post-hoc correction methods for WSL settings in which clean input-label pairs are unavailable. We propose a unified framework for estimating and correcting multicalibration under weak supervision by combining contamination-matrix risk rewrites with witness-based calibration constraints, yielding corrected multicalibration moments with finite-sample guarantees. We further propose weak-label multicalibration boost (WLMC), a generic post-hoc recalibration algorithm under weak supervision. Finally, we conduct experiments across multiple weak-supervision settings to evaluate multicalibration behavior and offer empirical insight into uncertainty estimation under weak supervision.

artificial intelligence, machine learning, pconf, (14 more...)

arXiv.org Machine Learning

2605.09857

Country: Asia > Japan (0.27)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.48)

Add feedback

Calibration by Distribution Matching: Trainable Kernel Calibration Metrics Charles Marx

Neural Information Processing SystemsFeb-11-2026, 20:02:06 GMT

These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.

calibration, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Wisconsin (0.04)
Europe > Portugal > Porto > Porto (0.04)

Genre: Research Report (0.67)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Data Science > Data Mining (0.68)

Add feedback

8bd31288ad8e9a31d519fdeede7ee47d-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 15:32:17 GMT

calibration, dataset, film-ensemble, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)

Add feedback

Uncertainty Estimation for Safety-critical Scene Segmentation via Fine-grained Reward Maximization

Neural Information Processing SystemsDec-26-2025, 01:06:30 GMT

Uncertainty estimation plays an important role for future reliable deployment of deep segmentation models in safety-critical scenarios such as medical applications. However, existing methods for uncertainty estimation have been limited by the lack of explicit guidance for calibrating the prediction risk and model confidence. In this work, we propose a novel fine-grained reward maximization (FGRM) framework, to address uncertainty estimation by directly utilizing an uncertainty metric related reward function with a reinforcement learning based model tuning algorithm. This would benefit the model uncertainty estimation with direct optimization guidance for model calibration. Specifically, our method designs a new uncertainty estimation reward function using the calibration metric, which is maximized to fine-tune an evidential learning pre-trained segmentation model for calibrating prediction risk.

estimation, safety-critical scene segmentation, uncertainty estimation, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Appendix

Neural Information Processing SystemsNov-14-2025, 20:10:39 GMT

The Appendix is structured as follows: A Models and Datasets 16 Details and references for the models and datasets used in this work. Table 1 provides an overview of the models used in this study. Table 1: Overview of models used in this study. A.2 Datasets We evaluate accuracy and calibration the following benchmark datasets: 1. V2 (Recht et al., 2019) is a new I The dataset contains 10 000 images. 3. In addition, the following datasets are used for pretraining as described in the text: 1.

calibration, classification error, variant, (14 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Uncertainty Calibration of Multi-Label Bird Sound Classifiers

Schwinger, Raphael, McEwen, Ben, Kather, Vincent S., Heinrich, René, Rauch, Lukas, Tomforde, Sven

arXiv.org Artificial IntelligenceNov-12-2025

Passive acoustic monitoring enables large-scale biodiversity assessment, but reliable classification of bioacoustic sounds requires not only high accuracy but also well-calibrated uncertainty estimates to ground decision-making. In bioacoustics, calibration is challenged by overlapping vocalisations, long-tailed species distributions, and distribution shifts between training and deployment data. The calibration of multi-label deep learning classifiers within the domain of bioacoustics has not yet been assessed. We systematically benchmark the calibration of four state-of-the-art multi-label bird sound classifiers on the BirdSet benchmark, evaluating both global, per-dataset and per-class calibration using threshold-free calibration metrics (ECE, MCS) alongside discrimination metrics (cmAP). Model calibration varies significantly across datasets and classes. While Perch v2 and ConvNeXt$_{BS}$ show better global calibration, results vary between datasets. Both models indicate consistent underconfidence, while AudioProtoPNet and BirdMAE are mostly overconfident. Surprisingly, calibration seems to be better for less frequent classes. Using simple post hoc calibration methods we demonstrate a straightforward way to improve calibration. A small labelled calibration set is sufficient to significantly improve calibration with Platt scaling, while global calibration parameters suffer from dataset variability. Our findings highlight the importance of evaluating and improving uncertainty calibration in bioacoustic classifiers.

artificial intelligence, calibration, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.08261

Country: Europe (0.28)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Multiclass Local Calibration With the Jensen-Shannon Distance

Barbera, Cesare, Perini, Lorenzo, De Toni, Giovanni, Passerini, Andrea, Pugnana, Andrea

arXiv.org Artificial IntelligenceOct-31-2025

Developing trustworthy Machine Learning (ML) models requires their predicted probabilities to be well-calibrated, meaning they should reflect true-class frequencies. Among calibration notions in multiclass classification, strong calibration is the most stringent, as it requires all predicted probabilities to be simultaneously calibrated across all classes. However, existing approaches to multiclass calibration lack a notion of distance among inputs, which makes them vulnerable to proximity bias: predictions in sparse regions of the feature space are systematically miscalibrated. This is especially relevant in high-stakes settings, such as healthcare, where the sparse instances are exactly those most at risk of biased treatment. In this work, we address this main shortcoming by introducing a local perspective on multiclass calibration. First, we formally define multiclass local calibration and establish its relationship with strong calibration. Second, we theoretically analyze the pitfalls of existing evaluation metrics when applied to multiclass local calibration. Third, we propose a practical method for enhancing local calibration in Neural Networks, which enforces alignment between predicted probabilities and local estimates of class frequencies using the Jensen-Shannon distance. Finally, we empirically validate our approach against existing multiclass calibration techniques.

artificial intelligence, calibration, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2510.26566

Country: Europe > Italy (0.28)

Genre: Research Report > New Finding (0.92)

Industry: Health & Medicine (1.00)

Technology: