AITopics

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.81)

Neural Information Processing SystemsSep-30-2025, 09:02:04 GMT

Clamping Variables and Approximate Inference

It was recently proved using graph covers (Ruozzi, 2012) that the Bethe partition function is upper bounded by the true partition function for a binary pairwise model that is attractive. Here we provide a new, arguably simpler proof from first principles. We make use of the idea of clamping a variable to a particular value. For an attractive model, we show that summing over the Bethe partition functions for each sub-model obtained after clamping any variable can only raise (and hence improve) the approximation. In fact, we derive a stronger result that may have other useful implications. Repeatedly clamping until we obtain a model with no cycles, where the Bethe approximation is exact, yields the result. We also provide a related lower bound on a broad class of approximate partition functions of general pairwise multi-label models that depends only on the topology. We demonstrate that clamping a few wisely chosen variables can be of practical value by dramatically reducing approximation error.

clamping variable and approximate inference, name change, partition function, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.44)
Information Technology > Artificial Intelligence > Machine Learning (0.44)

Neural Information Processing SystemsSep-30-2025, 08:48:55 GMT

Augur: Data-Parallel Probabilistic Modeling

Implementing inference procedures for each new probabilistic model is time-consuming and error-prone. Probabilistic programming addresses this problem by allowing a user to specify the model and then automatically generating the inference procedure. To make this practical it is important to generate high performance inference code. In turn, on modern architectures, high performance requires parallel execution. In this paper we present Augur, a probabilistic modeling language and compiler for Bayesian networks designed to make effective use of data-parallel architectures such as GPUs. We show that the compiler can generate data-parallel inference code scalable to thousands of GPU cores by making use of the conditional independence relationships in the Bayesian network.

augur, data-parallel probabilistic modeling, name change, (6 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)

Neural Information Processing SystemsSep-30-2025, 08:18:33 GMT

Distributed Estimation, Information Loss and Exponential Families

Distributed learning of probabilistic models from multiple data repositories with minimum communication is increasingly important. We study a simple communication-efficient learning framework that first calculates the local maximum likelihood estimates (MLE) based on the data subsets, and then combines the local MLEs to achieve the best possible approximation to the global MLE, based on the whole dataset jointly. We study the statistical properties of this framework, showing that the loss of efficiency compared to the global setting relates to how much the underlying distribution families deviate from full exponential families, drawing connection to the theory of information loss by Fisher, Rao and Efron. We show that the full-exponential-family-ness represents the lower bound of the error rate of arbitrary combinations of local MLEs, and is achieved by a KL-divergence-based combination method but not by a more common linear combination method. We also study the empirical properties of the KL and linear combination methods, showing that the KL method significantly outperforms linear combination in practical settings with issues such as model misspecification, non-convexity, and heterogeneous data partitions.

estimation, information loss and exponential family, name change, (6 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.78)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.60)

Neural Information Processing SystemsSep-30-2025, 08:18:30 GMT

Gaussian Process Volatility Model

The prediction of time-changing variances is an important task in the modeling of financial data. Standard econometric models are often limited as they assume rigid functional relationships for the evolution of the variance. Moreover, functional parameters are usually learned by maximum likelihood, which can lead to overfitting. To address these problems we introduce GP-Vol, a novel non-parametric model for time-changing variances based on Gaussian Processes. This new model can capture highly flexible functional relationships for the variances. Furthermore, we introduce a new online algorithm for fast inference in GP-Vol. This method is much faster than current offline inference procedures and it avoids overfitting problems by following a fully Bayesian approach. Experiments with financial data show that GP-Vol performs significantly better than current standard alternatives.

gaussian process volatility model, name change, variance, (6 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.62)

Gao, Erdun, Sejdinovic, Dino

ActiveCQ: Active Estimation of Causal Quantities

Estimating causal quantities (CQs) typically requires large datasets, which can be expensive to obtain, especially when measuring individual outcomes is costly. This challenge highlights the importance of sample-efficient active learning strategies. To address the narrow focus of prior work on the conditional average treatment effect, we formalize the broader task of Actively estimating Causal Quantities (ActiveCQ) and propose a unified framework for this general problem. Built upon the insight that many CQs are integrals of regression functions, our framework models the regression function with a Gaussian Process. For the distribution component, we explore both a baseline using explicit density estimators and a more integrated method using conditional mean embeddings in a reproducing kernel Hilbert space. This latter approach offers key advantages: it bypasses explicit density estimation, operates within the same function space as the GP, and adaptively refines the distributional model after each update. Our framework enables the principled derivation of acquisition strategies from the CQ's posterior uncertainty; we instantiate this principle with two utility functions based on information gain and total variance reduction. A range of simulated and semi-synthetic experiments demonstrate that our principled framework significantly outperforms relevant baselines, achieving substantial gains in sample efficiency across a variety of CQs.

active estimation, activecq, dataset, (15 more...)

2509.24293

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
North America > United States > Virginia (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength High (0.67)

Industry:

Health & Medicine > Therapeutic Area (0.92)
Health & Medicine > Pharmaceuticals & Biotechnology (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)

Bayesian Mixture-of-Experts: Towards Making LLMs Know What They Don't Know

Li, Albus Yizhuo

The Mixture-of-Experts (MoE) architecture has enabled the creation of massive yet efficient Large Language Models (LLMs). However, the standard deterministic routing mechanism presents a significant limitation: its inherent brittleness is a key contributor to model miscalibration and overconfidence, resulting in systems that often do not know what they don't know. This thesis confronts this challenge by proposing a structured \textbf{Bayesian MoE routing framework}. Instead of forcing a single, deterministic expert selection, our approach models a probability distribution over the routing decision itself. We systematically investigate three families of methods that introduce this principled uncertainty at different stages of the routing pipeline: in the \textbf{weight-space}, the \textbf{logit-space}, and the final \textbf{selection-space}. Through a series of controlled experiments on a 3-billion parameter MoE model, we demonstrate that this framework significantly improves routing stability, in-distribution calibration, and out-of-distribution (OoD) detection. The results show that by targeting this core architectural component, we can create a more reliable internal uncertainty signal. This work provides a practical and computationally tractable pathway towards building more robust and self-aware LLMs, taking a crucial step towards making them know what they don't know.

calibration, experiment, router, (16 more...)

2509.2383

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
(3 more...)

Pillai, Srijesh, Chandrawat, Rajesh Kumar

Profit over Proxies: A Scalable Bayesian Decision Framework for Optimizing Multi-Variant Online Experiments

Online controlled experiments (A/B tests) are fundamental to data - driven decision - making in the digital economy. However, their real - world application is frequently compromised by two critical shortcomings: the use of statistically flawed heuristics like " p - value peeking", which inflates false positive rates, and an over - reliance on proxy metrics like conversion rates, which can lead to decisions that inadvertently harm core business profitability. This paper addresses these challenges by introducing a comp rehensive and scalable Bayesian decision framework designed for profit optimization in multi - variant (A/B/n) experiments. We propose a hierarchical Bayesian model that simultaneously estimates the probability of conversion (using a Beta - Bernoulli model) and the monetary value of that conversion (using a robust Bayesian model for the mean transaction value). Building on this, we employ a decision - theoretic stopping rule based on Expected Loss, enabling experiments to be concluded not only when a superior variant is identified but also when it becomes clear that no variant offers a practically significant improvement (stopping f or futility). The framework successfully navigates "revenue traps" where a variant with a higher conversion rate would have resulted in a net financial loss, correctly terminates futile experiments early to conserve resources, and maintains strict statisti cal integrity throughout the monitoring process. Ultimately, this work provides a practical and principled methodology for organizations to move beyond simple A/B testing towards a mature, profit - driven experimentation culture, ensuring that statistical conclusions translate directly to strategic busines s value.

experiment, posterior distribution, variant, (13 more...)

2509.22677

Country:

North America > United States > New York (0.04)
Asia > Middle East > UAE > Dubai Emirate > Dubai (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > Italy (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Carey, Ryan, Ansanelli, Marina Maciel, Wolfe, Elie, Evans, Robin J.

Distinguishability of causal structures under latent confounding and selection

Statistical relationships in observed data can arise for several different reasons: the observed variables may be causally related, they may share a latent common cause, or there may be selection bias. Each of these scenarios can be modelled using different causal graphs. Not all such causal graphs, however, can be distinguished by experimental data. In this paper, we formulate the equivalence class of causal graphs as a novel graphical structure, the selected-marginalized directed graph (smDG). That is, we show that two directed acyclic graphs with latent and selected vertices have the same smDG if and only if they are indistinguishable, even when allowing for arbitrary interventions on the observed variables. As a substitute for the more familiar d-separation criterion for DAGs, we provide an analogous sound and complete separation criterion in smDGs for conditional independence relative to passive observations. Finally, we provide a series of sufficient conditions under which two causal structures are indistinguishable when there is only access to passive observations.

dag, smdg, vertex, (17 more...)

2509.20433

Country:

North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Denmark > North Jutland > Aalborg (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.67)
Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Artificial IntelligenceSep-30-2025

Estimating the strength and timing of syntactic structure building in naturalistic reading

Wang, Nan, Li, Jiaxuan

A central question in psycholinguistics is the timing of syntax in sentence processing. Much of the existing evidence comes from violation paradigms, which conflate two separable processes - syntactic category detection and phrase structure construction - and implicitly assume that phrase structure follows category detection. In this study, we use co-registered EEG and eye-tracking data from the ZuCo corpus to disentangle these processes and test their temporal order under naturalistic reading conditions. Analyses of gaze transitions showed that readers preferentially moved between syntactic heads, suggesting that phrase structures, rather than serial word order, organize scanpaths. Bayesian network modeling further revealed that structural depth was the strongest driver of deviations from linear reading, outweighing lexical familiarity and surprisal. Finally, fixation-related potentials demonstrated that syntactic surprisal influences neural activity before word onset (-184 to -10 ms) and during early integration (48 to 300 ms). These findings extend current models of syntactic timing by showing that phrase structure construction can precede category detection and dominate lexical influences, supporting a predictive "tree-scaffolding" account of comprehension.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.23195

Country: North America > United States (0.93)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)