AITopics | calibration measure

Collaborating Authors

calibration measure

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Testable and Actionable Calibration for Full Swap Regret

Bairaktari, Konstantina, Hu, Lunjia, Nguyen, Huy L., Ullman, Jonathan

arXiv.org Machine LearningMay-19-2026

AI generated predictions increasingly inform decision making in critical tasks, and therefore must be trustworthy. One widely used measure of trustworthiness is calibration, which requires that the predictions match the true frequencies and can be treated like real probabilities of a given outcome. However, defining calibration is subtle, and designing good measures of calibration error has been an active topic of recent research. The first goal is to find calibration measures that are actionable, meaning they can inform decision makers about their utility loss when predictions are treated as true probabilities, which is known as swap regret. The second goal is to find calibration measures that are testable, meaning that calibration error can be measured from a small sample of predictions and outcomes. Although these are very basic requirements, there is no existing calibration measure that fully satisfies both properties, and all existing measures relax actionability by bounding a weaker notion of swap regret, or relax testability by having suboptimal estimation error. We introduce a new calibration measure, Soft-Binned Calibration Decision Loss (SCDL), which we prove is fully actionable without weakening either requirement, and testable with nearly optimal error rate. In addition, SCDL satisfies other desired properties such as continuity and consistency. We also provide a set of experiments confirming that the theoretical advantages of SCDL compared to other measures lead to better performance in practice.

artificial intelligence, machine learning, scdl, (16 more...)

arXiv.org Machine Learning

2605.17749

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Computational and Statistical Hardness of Calibration Distance

Qiao, Mingda

arXiv.org Machine LearningMar-20-2026

The distance from calibration, introduced by Błasiok, Gopalan, Hu, and Nakkiran (STOC 2023), has recently emerged as a central measure of miscalibration for probabilistic predictors. We study the fundamental problems of computing and estimating this quantity, given either an exact description of the data distribution or only sample access to it. We give an efficient algorithm that exactly computes the calibration distance when the distribution has a uniform marginal and noiseless labels, which improves the $O(1/\sqrt{|\mathcal{X}|})$ additive approximation of Qiao and Zheng (COLT 2024) for this special case. Perhaps surprisingly, the problem becomes $\mathsf{NP}$-hard when either of the two assumptions is removed. We extend our algorithm to a polynomial-time approximation scheme for the general case. For the estimation problem, we show that $Θ(1/ε^3)$ samples are sufficient and necessary for the empirical calibration distance to be upper bounded by the true distance plus $ε$. In contrast, a polynomial dependence on the domain size -- incurred by the learning-based baseline -- is unavoidable for two-sided estimation. Our positive results are based on simple sparsifications of both the distribution and the target predictor, which significantly reduce the search space for computation and lead to stronger concentration for the estimation problem. To prove the hardness results, we introduce new techniques for certifying lower bounds on the calibration distance -- a problem that is hard in general due to its $\textsf{co-NP}$-completeness.

artificial intelligence, caldistd, machine learning, (18 more...)

arXiv.org Machine Learning

2603.18391

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.34)

Add feedback

d4cbcae8cfc8aa3ae897a1296e4e0cac-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 07:01:01 GMT

artificial intelligence, calibration measure, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report > Experimental Study (0.92)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.67)

Add feedback

d20e3c35bedb17a9f6f01fc434a30fa3-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 06:21:48 GMT

artificial intelligence, calibration, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Europe > France (0.04)
North America > United States > Virginia (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (0.92)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Data Science (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

1c336b8080f82bcc2cd2499b4c57261d-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 15:03:23 GMT

calibration, calibration error, estimator, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
Europe > Sweden > Uppsala County > Uppsala (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

X-CAL: Explicit Calibration for Survival Analysis Mark Goldstein

Neural Information Processing SystemsFeb-10-2026, 13:45:06 GMT

When a model's predicted number of events within any time interval is similar to the observed number, it is called well-calibrated . A survival model's calibration can be measured using, for instance, distributional calibration (

artificial intelligence, calibration, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.47)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Calibration tests in multi-class classification: A unifying framework

Neural Information Processing SystemsDec-25-2025, 02:12:12 GMT

In safety-critical applications a probabilistic model is usually required to be calibrated, i.e., to capture the uncertainty of its predictions accurately. In multi-class classification, calibration of the most confident predictions only is often not sufficient. We propose and study calibration measures for multi-class classification that generalize existing measures such as the expected calibration error, the maximum calibration error, and the maximum mean calibration error. We propose and evaluate empirically different consistent and unbiased estimators for a specific class of measures based on matrix-valued kernels. Importantly, these estimators can be interpreted as test statistics associated with well-defined bounds and approximations of the p-value under the null hypothesis that the model is calibrated, significantly improving the interpretability of calibration measures, which otherwise lack any meaningful unit or scale.

calibration test, multi-class classification, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

Truthfulness of Calibration Measures

Neural Information Processing SystemsOct-10-2025, 17:45:03 GMT

We study calibration measures in a sequential prediction setup.

calibration measure, forecaster, ssce, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report > Experimental Study (0.92)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.67)

Add feedback

Testing Calibration in Nearly-Linear Time

Neural Information Processing SystemsOct-10-2025, 17:29:10 GMT

Probabilistic predictions are at the heart of modern data science.

algorithm, calibration, testing problem, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Europe > France (0.04)
North America > United States > Virginia (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (0.92)
Research Report > New Finding (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Making and Evaluating Calibrated Forecasts

Lu, Yuxuan, Wu, Yifan, Hartline, Jason, Hu, Lunjia

arXiv.org Machine LearningOct-9-2025

Calibrated predictions can be reliably interpreted as probabilities. An important step towards achieving better calibration is to design an appropriate calibration measure to meaningfully assess the miscalibration level of a predictor. A recent line of work initiated by Haghtalab et al. [2024] studies the design of truthful calibration measures: a truthful measure is minimized when a predictor outputs the true probabilities, whereas a non-truthful measure incentivizes the predictor to lie so as to appear more calibrated. All previous calibration measures were non-truthful until Hartline et al. [2025] introduced the first perfectly truthful calibration measures for binary prediction tasks in the batch setting. We introduce a perfectly truthful calibration measure for multi-class prediction tasks, generalizing the work of Hartline et al. [2025] beyond binary prediction. We study common methods of extending calibration measures from binary to multi-class prediction and identify ones that do or do not preserve truthfulness. In addition to truthfulness, we mathematically prove and empirically verify that our calibration measure exhibits superior robustness: it robustly preserves the ordering between dominant and dominated predictors, regardless of the choice of hyperparameters (bin sizes). This result addresses the non-robustness issue of binned ECE, which has been observed repeatedly in prior work.

calibration measure, classwise, predictor, (16 more...)

arXiv.org Machine Learning

2510.06388

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback