AITopics | contingency table

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > District of Columbia > Washington (0.05)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)

Genre: Research Report (0.93)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Information Technology (0.92)
Health & Medicine > Epidemiology (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Kübler, Jonas, Budhathoki, Kailash, Kleindessner, Matthäus, Zhou, Xiong, Yin, Junming, Khetan, Ashish, Karypis, George

When LLMs get significantly worse: A statistical approach to detect model degradations

arXiv.org Machine LearningFeb-12-2026

Minimizing the inference cost and latency of foundation models has become a crucial area of research. Optimization approaches include theoretically lossless methods and others without accuracy guarantees like quantization. In all of these cases it is crucial to ensure that the model quality has not degraded. However, even at temperature zero, model generations are not necessarily robust even to theoretically lossless model optimizations due to numerical errors. We thus require statistical tools to decide whether a finite-sample accuracy deviation is an evidence of a model's degradation or whether it can be attributed to (harmless) noise in the evaluation. We propose a statistically sound hypothesis testing framework based on McNemar's test allowing to efficiently detect model degradations, while guaranteeing a controlled rate of false positives. The crucial insight is that we have to confront the model scores on each sample, rather than aggregated on the task level. Furthermore, we propose three approaches to aggregate accuracy estimates across multiple benchmarks into a single decision. We provide an implementation on top of the largely adopted open source LM Evaluation Harness and provide a case study illustrating that the method correctly flags degraded models, while not flagging model optimizations that are provably lossless. We find that with our tests even empirical accuracy degradations of 0.3% can be confidently attributed to actual degradations rather than noise.

large language model, machine learning, natural language, (20 more...)

2602.10144

Country:

Europe > Austria > Vienna (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > Experimental Study (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.89)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)

Neural Information Processing SystemsFeb-11-2026, 10:07:08 GMT

d827f12e35eae370ba9c65b7f6026695-Paper.pdf

algorithm, map inference problem, objective function value, (14 more...)

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Industry: Information Technology (0.48)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)

Neural Information Processing SystemsFeb-9-2026, 05:03:45 GMT

Learning Discrete Latent Variable Structures with Tensor Rank Conditions Zhengming Chen

Unobserved discrete data are ubiquitous in many scientific disciplines, and how to learn the causal structure of these latent variables is crucial for uncovering data patterns. Most studies focus on the linear latent variable model or impose strict constraints on latent structures, which fail to address cases in discrete data involving non-linear relationships or complex latent structures.

artificial intelligence, bayesian inference, machine learning, (15 more...)

Country:

North America > United States (0.14)
Africa > Senegal > Kolda Region > Kolda (0.04)
Asia > China > Guangdong Province > Shantou (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

arXiv.org Machine LearningNov-6-2025

Unifying Information-Theoretic and Pair-Counting Clustering Similarity

Gates, Alexander J.

Comparing clusterings is central to evaluating unsupervised models, yet the many existing similarity measures can produce widely divergent, sometimes contradictory, evaluations. Clustering similarity measures are typically organized into two principal families, pair-counting and information-theoretic, reflecting whether they quantify agreement through element pairs or aggregate information across full cluster contingency tables. Prior work has uncovered parallels between these families and applied empirical normalization or chance-correction schemes, but their deeper analytical connection remains only partially understood. Here, we develop an analytical framework that unifies these families through two complementary perspectives. First, both families are expressed as weighted expansions of observed versus expected co-occurrences, with pair-counting arising as a quadratic, low-order approximation and information-theoretic measures as higher-order, frequency-weighted extensions. Second, we generalize pair-counting to $k$-tuple agreement and show that information-theoretic measures can be viewed as systematically accumulating higher-order co-assignment structure beyond the pairwise level. We illustrate the approaches analytically for the Rand index and Mutual Information, and show how other indices in each family emerge as natural extensions. Together, these views clarify when and why the two regimes diverge, relating their sensitivities directly to weighting and approximation order, and provide a principled basis for selecting, interpreting, and extending clustering similarity measures across applications.

agreement, artificial intelligence, machine learning, (14 more...)

2511.03

Country:

North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.36)

Neural Information Processing SystemsOct-10-2025, 16:23:01 GMT

c7313c42bc9536889fc0af617c384fa6-Paper-Conference.pdf

agent, constraint, odm, (16 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > District of Columbia > Washington (0.05)
(2 more...)

Genre: Research Report (0.93)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Information Technology (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Neural Information Processing SystemsOct-9-2025, 20:30:40 GMT

1f530eef1ae1d4d4f4e0f51437976395-Paper-Conference.pdf

causal cluster, latent variable, tensor rank condition, (12 more...)

Country:

North America > United States (0.14)
Africa > Senegal > Kolda Region > Kolda (0.04)
Asia > China > Guangdong Province > Shantou (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Dombowsky, Alexander, Dunson, David B.

Learning Discrete Bayesian Networks with Hierarchical Dirichlet Shrinkage

arXiv.org Machine LearningSep-17-2025

Discrete Bayesian networks (DBNs) provide a broadly useful framework for modeling dependence structures in multivariate categorical data. There is a vast literature on methods for inferring conditional probabilities and graphical structure in DBNs, but data sparsity and parametric assumptions are major practical issues. In this article, we detail a comprehensive Bayesian framework for learning DBNs. First, we propose a hierarchical prior for the conditional probabilities that enables complicated interactions between parent variables and stability in sparse regimes. We give a novel Markov chain Monte Carlo (MCMC) algorithm utilizing parallel Langevin proposals to generate exact posterior samples, avoiding the pitfalls of variational approximations. Moreover, we verify that the full conditional distribution of the concentration parameters is log-concave under mild conditions, facilitating efficient sampling. We then propose two methods for learning network structures, including parent sets, Markov blankets, and DAGs, from categorical data. The first cycles through individual edges each MCMC iteration, whereas the second updates the entire structure as a single step. We evaluate the accuracy, power, and MCMC performance of our methods on several simulation studies. Finally, we apply our methodology to uncover prognostic network structure from primary breast cancer samples.

conditional probability, papjq, probability, (17 more...)

2509.13267

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Orange County > Irvine (0.14)
North America > United States > North Carolina > Durham County > Durham (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Neural Information Processing SystemsAug-17-2025, 18:25:22 GMT

d827f12e35eae370ba9c65b7f6026695-Paper.pdf

algorithm, artificial intelligence, machine learning, (17 more...)

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Industry: Information Technology (0.48)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)

Assunção, Renato, Figueiredo, Flávio, Júnior, Francisco N. Tinoco, de Sá-Freire, Léo M., Silva, Fábio

An Interpretable Measure for Quantifying Predictive Dependence between Continuous Random Variables -- Extended Version

arXiv.org Machine LearningJan-18-2025

A fundamental task in statistical learning is quantifying the joint dependence or association between two continuous random variables. We introduce a novel, fully non-parametric measure that assesses the degree of association between continuous variables $X$ and $Y$, capable of capturing a wide range of relationships, including non-functional ones. A key advantage of this measure is its interpretability: it quantifies the expected relative loss in predictive accuracy when the distribution of $X$ is ignored in predicting $Y$. This measure is bounded within the interval [0,1] and is equal to zero if and only if $X$ and $Y$ are independent. We evaluate the performance of our measure on over 90,000 real and synthetic datasets, benchmarking it against leading alternatives. Our results demonstrate that the proposed measure provides valuable insights into underlying relationships, particularly in cases where existing methods fail to capture important dependencies.

artificial intelligence, machine learning, predep, (19 more...)

2501.10815

Country: South America > Brazil (0.04)

Genre: Research Report > New Finding (0.54)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)