multicalibration
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
A Omitted Proofs
Taking = p / gives the desired claim. Claim 2.7, we know that the multicalibration violation for The inequalities follow by Holder's inequality and the assumed bound on the weight of Recall that Cov[ y, z ]= E [ yz ] E [ y ] E [ z ] . Here, we give a high-level overview of the MCBoost algorithm of [ 20 ] and weak agnostic learning. Algorithm 2 MCBoost Parameters: hypothesis class C and > 0 Given: Dataset S sampled from D Initialize: p ( x) 1 / 2 . By Lemma 3.8, we know that In this Appendix, we give a full account of the definitions and results stated in Section 4 .
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
Optimal Lower Bounds for Online Multicalibration
Collina, Natalie, Lu, Jiuyao, Noarov, Georgy, Roth, Aaron
We prove tight lower bounds for online multicalibration, establishing an information-theoretic separation from marginal calibration. In the general setting where group functions can depend on both context and the learner's predictions, we prove an $Ω(T^{2/3})$ lower bound on expected multicalibration error using just three disjoint binary groups. This matches the upper bounds of Noarov et al. (2025) up to logarithmic factors and exceeds the $O(T^{2/3-\varepsilon})$ upper bound for marginal calibration (Dagan et al., 2025), thereby separating the two problems. We then turn to lower bounds for the more difficult case of group functions that may depend on context but not on the learner's predictions. In this case, we establish an $\widetildeΩ(T^{2/3})$ lower bound for online multicalibration via a $Θ(T)$-sized group family constructed using orthogonal function systems, again matching upper bounds up to logarithmic factors.
- South America > Suriname > North Atlantic Ocean (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > New York (0.04)
- Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)
Bridging Multicalibration and Out-of-distribution Generalization Beyond Covariate Shift
We establish a new model-agnostic optimization framework for out-of-distribution generalization via multicalibration, a criterion that ensures a predictor is calibrated across a family of overlapping groups. Multicalibration is shown to be associated with robustness of statistical inference under covariate shift. We further establish a link between multicalibration and robustness for prediction tasks both under and beyond covariate shift. We accomplish this by extending multicalibration to incorporate grouping functions that consider covariates and labels jointly. This leads to an equivalence of the extended multicalibration and invariance, an objective for robust learning in existence of concept shift. We show a linear structure of the grouping function class spanned by density ratios, resulting in a unifying framework for robust learning by designing specific grouping functions. We propose MC-Pseudolabel, a post-processing algorithm to achieve both extended multicalibration and out-of-distribution generalization. The algorithm, with lightweight hyperparameters and optimization through a series of supervised regression steps, achieves superior performance on real-world datasets with distribution shift.
Multicalibration for LLM-based Code Generation
Campos, Viola, Kuschnereit, Robin, Ulges, Adrian
As AI-based code generation becomes widespread, researchers are investigating the calibration of code LLMs - ensuring their confidence scores faithfully represent the true likelihood of code correctness. To do so, we investigate multicalibration, which can capture additional factors about a coding problem, such as complexity, code length, or programming language used. We study four multicalibration approaches on three function synthesis benchmarks, using latest-generation code LLMs (Qwen3 Coder, GPT-OSS, DeepSeek-R1-Distill). Our results demonstrate that multicalibration can yield distinct improvements over both uncalibrated token likelihoods (+1.03 in skill score) and baseline calibrations (+0.37 in skill score). We study the influence of the aforementioned factors in ablations, and make our dataset (consisting of code generations, likelihoods, and correctness labels) available for future research on code LLM calibration.
- Europe > Austria > Vienna (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Germany > Hesse > Darmstadt Region > Wiesbaden (0.04)
- (17 more...)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (13 more...)
Multicalibration yields better matchings
Baldeschi, Riccardo Colini, Di Gregorio, Simone, Fioravanti, Simone, Fusco, Federico, Guy, Ido, Haimovich, Daniel, Leonardi, Stefano, Linder, Fridolin, Perini, Lorenzo, Russo, Matteo, Tax, Niek
Consider the problem of finding the best matching in a weighted graph where we only have access to predictions of the actual stochastic weights, based on an underlying context. If the predictor is the Bayes optimal one, then computing the best matching based on the predicted weights is optimal. However, in practice, this perfect information scenario is not realistic. Given an imperfect predictor, a suboptimal decision rule may compensate for the induced error and thus outperform the standard optimal rule. In this paper, we propose multicalibration as a way to address this problem. This fairness notion requires a predictor to be unbiased on each element of a family of protected sets of contexts. Given a class of matching algorithms $\mathcal C$ and any predictor $γ$ of the edge-weights, we show how to construct a specific multicalibrated predictor $\hat γ$, with the following property. Picking the best matching based on the output of $\hat γ$ is competitive with the best decision rule in $\mathcal C$ applied onto the original predictor $γ$. We complement this result by providing sample complexity bounds.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Italy > Lazio > Rome (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California (0.14)
- North America > United States > Texas (0.04)
- North America > Greenland (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Online Minimax Multiobjective Optimization: Multicalibeating and Other Applications Daniel Lee
We introduce a simple but general online learning framework in which a learner plays against an adversary in a vector-valued game that changes every round. Even though the learner's objective is not convex-concave (and so the minimax theorem does not apply), we give a simple algorithm that can compete with the setting in which the adversary must announce their action first, with optimally diminishing regret.
- North America > United States > Pennsylvania (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Leisure & Entertainment (0.47)
- Education (0.34)