AITopics | disagreement discrepancy

5bacb12bf81e98e2ee0eed953a23c656-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 07:34:23 GMT

artificial intelligence, imagenetv1 sub-population, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On the Bayes Inconsistency of Disagreement Discrepancy Surrogates

Marchant, Neil G., Cullen, Andrew C., Liu, Feng, Erfani, Sarah M.

arXiv.org Machine LearningDec-8-2025

Deep neural networks often fail when deployed in real-world contexts due to distribution shift, a critical barrier to building safe and reliable systems. An emerging approach to address this problem relies on \emph{disagreement discrepancy} -- a measure of how the disagreement between two models changes under a shifting distribution. The process of maximizing this measure has seen applications in bounding error under shifts, testing for harmful shifts, and training more robust models. However, this optimization involves the non-differentiable zero-one loss, necessitating the use of practical surrogate losses. We prove that existing surrogates for disagreement discrepancy are not Bayes consistent, revealing a fundamental flaw: maximizing these surrogates can fail to maximize the true disagreement discrepancy. To address this, we introduce new theoretical results providing both upper and lower bounds on the optimality gap for such surrogates. Guided by this theory, we propose a novel disagreement loss that, when paired with cross-entropy, yields a provably consistent surrogate for disagreement discrepancy. Empirical evaluations across diverse benchmarks demonstrate that our method provides more accurate and robust estimates of disagreement discrepancy than existing approaches, particularly under challenging adversarial conditions.

artificial intelligence, disagreement discrepancy, machine learning, (19 more...)

arXiv.org Machine Learning

2512.05931

Country:

North America > Canada > Ontario > Toronto (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

5bacb12bf81e98e2ee0eed953a23c656-Paper-Conference.pdf

Neural Information Processing SystemsNov-16-2025, 15:21:55 GMT

artificial intelligence, distribution shift, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Appendix A Experimental Details

Neural Information Processing SystemsOct-8-2025, 18:26:48 GMT

This is referred to as DOC-Feat in [24]. COT uses the empirical estimator of the Earth Mover's Distance between labels from the source domain and softmax outputs of samples from the target A.2 Dataset Details In this section, we provide additional details about the datasets used in our benchmark study. Overall, we obtain 5 datasets (i.e., CIFAR10v1, CIF AR100 Similar to CIFAR10, we use the original CIFAR100 set as the source dataset. Overall, we obtain 3 different domains. Overall, we obtain 3 different domains.

artificial intelligence, imagenetv1 sub-population, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

5bacb12bf81e98e2ee0eed953a23c656-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 18:26:45 GMT

artificial intelligence, distribution shift, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ODD: Overlap-aware Estimation of Model Performance under Distribution Shift

Mishra, Aayush, Liu, Anqi

arXiv.org Artificial IntelligenceJun-19-2025

Reliable and accurate estimation of the error of an ML model in unseen test domains is an important problem for safe intelligent systems. Prior work uses disagreement discrepancy (DIS^2) to derive practical error bounds under distribution shifts. It optimizes for a maximally disagreeing classifier on the target domain to bound the error of a given source classifier. Although this approach offers a reliable and competitively accurate estimate of the target error, we identify a problem in this approach which causes the disagreement discrepancy objective to compete in the overlapping region between source and target domains. With an intuitive assumption that the target disagreement should be no more than the source disagreement in the overlapping region due to high enough support, we devise Overlap-aware Disagreement Discrepancy (ODD). Maximizing ODD only requires disagreement in the non-overlapping target domain, removing the competition. Our ODD-based bound uses domain-classifiers to estimate domain-overlap and better predicts target performance than DIS^2. We conduct experiments on a wide array of benchmarks to show that our method improves the overall performance-estimation error while remaining valid and reliable. Our code and results are available on GitHub.

dis 2, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.14978

Country: North America > United States > Maryland > Baltimore (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

(Almost) Provable Error Bounds Under Distribution Shift via Disagreement Discrepancy

Neural Information Processing SystemsJan-18-2025, 14:42:23 GMT

We derive a new, (almost) guaranteed upper bound on the error of deep neural networks under distribution shift using unlabeled test data. Prior methods are either vacuous in practice or accurate on average but heavily underestimate error for a sizeable fraction of shifts. In particular, the latter only give guarantees based on complex continuous measures such as test calibration, which cannot be identified without labels, and are therefore unreliable. Instead, our bound requires a simple, intuitive condition which is well justified by prior empirical works and holds in practice effectively 100\% of the time. The bound is inspired by \mathcal{H}\Delta\mathcal{H} -divergence but is easier to evaluate and substantially tighter, consistently providing non-vacuous test error upper bounds.

artificial intelligence, disagreement discrepancy, machine learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Add feedback

(Almost) Provable Error Bounds Under Distribution Shift via Disagreement Discrepancy

Rosenfeld, Elan, Garg, Saurabh

arXiv.org Artificial IntelligenceMay-31-2023

When deploying a model, it is important to be confident in how it will perform under inevitable distribution shift. Standard methods for achieving this include data dependent uniform convergence bounds (Ben-David et al., 2006, Mansour et al., 2009) (typically vacuous in practice) or assuming a precise model of how the distribution can shift (Chen et al., 2022, Rahimian and Mehrotra, 2019, Rosenfeld et al., 2021). Unfortunately, it is difficult or impossible to determine how severely these assumptions are violated by real data ("all models are wrong"), so practitioners usually cannot trust such bounds with confidence. To better estimate test performance in the wild, some recent work instead tries to directly predict accuracy of neural networks using unlabeled data from the test distribution of interest, (Baek et al., 2022, Garg et al., 2022, Lu et al., 2023). While these methods predict the test performance surprisingly well, they lack pointwise trustworthiness and verifiability: their estimates are good on average over all distribution shifts, but they provide no guarantee or signal of the quality of any individual prediction (here, each point is a single test distribution, for which a method predicts a classifier's average accuracy). Because of the opaque conditions under which these methods work, it is also difficult to anticipate their failure cases--indeed, it is reasonably common for them to substantially overestimate test accuracy for a particular shift, which is problematic when optimistic deployment can be costly or catastrophic.

artificial intelligence, distribution shift, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.00312

Country: