AITopics | Bayesian Inference

Collaborating Authors

Bayesian Inference

Bayes' Theorem allows a program to infer the probabilities of likely causes from the probabilities of their effects, when what it is given are the probabilities of effects, given the causes.

News Overviews Instructional Materials AI-Alerts Classics

Recursive Maximum Likelihood Estimation for Interacting Particle Systems using Virtual Particles

Sharrock, Louis, Kantas, Nikolas, Pavliotis, Grigorios A.

arXiv.org Machine LearningMay-4-2026

We study recursive maximum likelihood estimation for stochastic interacting particle systems based on continuous observation of a single particle. In this regime, consistent estimation of the finite-particle log-likelihood is not possible, even in the limit as the number of particles $N\rightarrow\infty$ and the time horizon $t\rightarrow\infty$. We thus seek to optimise the stationary log-likelihood of the limiting mean-field system. We achieve this via a form of stochastic gradient estimate in continuous time, with stochastic gradient estimates computed using the continuous trajectory of the single observed particle, alongside a virtual interacting particle system and a virtual tangent interacting particle system, which are integrated with the online parameter estimate. For fixed numbers of real and virtual particles, we show that the resulting algorithms drive the gradient of a finite-particle surrogate objective to zero as $t\to\infty$. We then prove that, in the iterated limit $t\to\infty$ followed by $N,M\to\infty$, these surrogate gradients converge uniformly to the gradient of the stationary log-likelihood of the limiting mean-field system, yielding convergence to its stationary points. We illustrate the method on several numerical examples, including a model with quadratic confinement and interaction potentials, a model of interacting FitzHugh--Nagumo neurons, and a stochastic Kuramoto model.

artificial intelligence, machine learning, particle, (20 more...)

arXiv.org Machine Learning

2605.00786

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Confusions over Time: An Interpretable Bayesian Model to Characterize Trends in Decision Making

Himabindu Lakkaraju, Jure Leskovec

Neural Information Processing SystemsMay-1-2026, 05:46:36 GMT

We propose Confusions over Time (CoT), a novel generative framework which facilitates a multi-granular analysis of the decision making process. The CoT not only models the confusions or error properties of individual decision makers and their evolution over time, but also allows us to obtain diagnostic insights into the collective decision making process in an interpretable manner.

artificial intelligence, decision maker, machine learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry:

Law (0.47)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Probabilistic Attention for Interactive Segmentation

Neural Information Processing SystemsMay-1-2026, 01:53:12 GMT

We provide a probabilistic interpretation of attention and show that the standard dotproduct attention in transformers is a special case of Maximum APosteriori (MAP) inference. The proposed approach suggests the use of Expectation Maximization algorithms for online adaptation of key and value model parameters. This approach is useful for cases in which external agents, e.g., annotators, provide inference-time information about the correct values of some tokens, e.g., the semantic category of some pixels, and we need for this new information to propagate to other tokens in a principled manner. We illustrate the approach on an interactive semantic segmentation task in which annotators and models collaborate online to improve annotation efficiency. Using standard benchmarks, we observe that key adaptation boosts model performance ( 10% mIoU) in the low feedback regime and value propagation improves model responsiveness in the high feedback regime.

computer vision, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre: Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.95)
(2 more...)

Add feedback

Tractable Regularization of Probabilistic Circuits

Neural Information Processing SystemsMay-1-2026, 01:51:49 GMT

Probabilistic Circuits (PCs) are a promising avenue for probabilistic modeling. They combine advantages of probabilistic graphical models (PGMs) with those of neural networks (NNs). Crucially, however, they are tractable probabilistic models, supporting efficient and exact computation of many probabilistic inference queries, such as marginals and MAP. Further, since PCs are structured computation graphs, they can take advantage of deep-learning-style parameter updates, which greatly improves their scalability. However, this innovation also makes PCs prone to overfitting, which has been observed in many standard benchmarks. Despite the existence of abundant regularization techniques for both PGMs and NNs, they are not effective enough when applied to PCs.

artificial intelligence, bayesian inference, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre:

Research Report (0.46)
Instructional Material (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)

Add feedback

On The Existence of The Adversarial Bayes Classifier

Neural Information Processing SystemsMay-1-2026, 01:50:55 GMT

While it has been the subject of several recent theoretical studies, many important questions related to adversarial robustness are still open.

artificial intelligence, classifier, machine learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.52)

Add feedback

159f7fe5b51ecd663b85337e8e28ce65-Paper-Conference.pdf

Neural Information Processing SystemsMay-1-2026, 01:42:30 GMT

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)

Add feedback

07fbde96bee50f4e09303fd4f877c2f3-Paper-Conference.pdf

Neural Information Processing SystemsMay-1-2026, 01:40:14 GMT

artificial intelligence, machine learning, prediction, (20 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning

Neural Information Processing SystemsMay-1-2026, 01:33:10 GMT

While distributional reinforcement learning (DistRL) has been empirically effective, the question of when and why it is better than vanilla, non-distributional RL has remained unanswered. This paper explains the benefits of DistRL through the lens of small-loss bounds, which are instance-dependent bounds that scale with optimal achievable cost. Particularly, our bounds converge much faster than those from non-distributional approaches if the optimal cost is small. As warmup, we propose a distributional contextual bandit (DistCB) algorithm, which we show enjoys small-loss regret bounds and empirically outperforms the state-of-the-art on three real-world tasks. In online RL, we propose a DistRL algorithm that constructs confidence sets using maximum likelihood estimation. We prove that our algorithm enjoys novel small-loss PAC bounds in low-rank MDPs. As part of our analysis, we introduce the ℓ1 distributional eluder dimension which may be of independent interest. Then, in offline RL, we show that pessimistic DistRL enjoys small-loss PAC bounds that are novel to the offline setting and are more robust to bad single-policy coverage.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Genre:

Research Report (0.66)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Add feedback

Bayesian X-Learner: Calibrated Posterior Inference for Heterogeneous Treatment Effects under Heavy-Tailed Outcomes

Uehara, Eichi

arXiv.org Machine LearningMay-1-2026

Conditional Average Treatment Effect (CATE) estimation in practice demands three properties simultaneously: heterogeneous effects τ(x), calibrated uncertainty over them, and robustness to the heavy tails that contaminate real outcome data. Meta-learners (Künzel et al., 2019) give (i); causal forests and BART give (i)-(ii) with Gaussian-tail assumptions; no widely used tool gives all three. We present Bayesian X-Learner, an X-Learner built on cross-fitted doubly robust pseudo-outcomes (Kennedy, 2020) with a full MCMC posterior over τ(x) via a Welsch redescending pseudo-likelihood. On Hill's IHDP benchmark the default configuration attains mean εPEHE = 0.56 on 5 replications (lowest mean; differences from S-/T-/X-learners, full-config Causal BART, and a causal forest baseline are not significant at α = 0.05, and rank ordering is unstable at 10 replications -- IHDP comparisons are competitive rather than dominant). On contaminated "whale" DGPs with up to 20-25% tail density, a one-flag extension (contamination_severity) that selects a Huberδ nuisance loss per Huber's minimax-δ relation recovers RMSE 0.13 with tight credible intervals (single-cross-fit 30-seed coverage 83% [Wilson 66%, 93%] at 20% density; modularBayes pooling with Bayesian-bootstrap nuisance draws restores nominal 95% coverage). We validate on the Hillstrom email-marketing RCT (N = 42,613), demonstrating consistent behaviour on real heavy-tailed outcome data, and report covariate-stratified τ(x) coverage across covariate quintiles to substantiate calibration for heterogeneous effects beyond scalar summaries. We draw a clean distinction between tails-as-contamination (handled by Welsch + Huber nuisance) and tails-as-signal (handled by a tail-aware CATE basis); an empirical probe confirms a tail-aware basis recovers τtail with full subgroup coverage, while the library's Hill-estimator path is contamination-directed and should not be used for heterogeneous τ. We map six empirical boundaries (contamination ceiling, clean-data efficiency cost, basis sensitivity, sample size, treatment type, compute) and show where other tools are preferable. Code and reproducible benchmarks are released.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

2604.27394

Genre: Research Report > Experimental Study (0.92)

Industry: Health & Medicine (0.93)

Technology: