AITopics | Bayesian Inference

Collaborating Authors

Bayesian Inference

Bayes' Theorem allows a program to infer the probabilities of likely causes from the probabilities of their effects, when what it is given are the probabilities of effects, given the causes.

News Overviews Instructional Materials AI-Alerts Classics

Input Adaptive Bayesian Model Averaging

Slavutsky, Yuli, Salazar, Sebastian, Blei, David M.

arXiv.org Machine LearningOct-28-2025

This paper studies prediction with multiple candidate models, where the goal is to combine their outputs. This task is especially challenging in heterogeneous settings, where different models may be better suited to different inputs. We propose input adaptive Bayesian Model Averaging (IA-BMA), a Bayesian method that assigns model weights conditional on the input. IA-BMA employs an input adaptive prior, and yields a posterior distribution that adapts to each prediction, which we estimate with amortized variational inference. We derive formal guarantees for its performance, relative to any single predictor selected per input. We evaluate IABMA across regression and classification tasks, studying data from personalized cancer treatment, credit-card fraud detection, and UCI datasets. IA-BMA consistently delivers more accurate and better-calibrated predictions than both non-adaptive baselines and existing adaptive methods. Many applications require adaptive predictions. In personalized medicine, different patients respond differently to the same treatment (Mahajan et al., 2023); in fairness-sensitive domains, predictions need to adapt to subpopulations (Wang et al., 2019; Grother et al., 2019); and in fraud detection, behavioral data is often heteroskedastic and varies substantially across inputs (V armedja et al., 2019).

artificial intelligence, machine learning, prediction, (18 more...)

arXiv.org Machine Learning

2510.22054

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California (0.04)

Genre: Research Report (1.00)

Industry:

Law Enforcement & Public Safety > Fraud (0.89)
Health & Medicine > Therapeutic Area > Oncology (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Schrodinger Neural Network and Uncertainty Quantification: Quantum Machine

Hammad, M. M.

arXiv.org Artificial IntelligenceOct-28-2025

We introduce the Schrodinger Neural Network (SNN), a principled architecture for conditional density estimation and uncertainty quantification inspired by quantum mechanics. The SNN maps each input to a normalized wave function on the output domain and computes predictive probabilities via the Born rule. The SNN departs from standard parametric likelihood heads by learning complex coefficients of a spectral expansion (e . g ., Chebyshev polynomials) whose squared modulus yields the conditional density $p(y|x)=\left| ψ_x(y)\right| {}^2$ with analytic normalization. This representation confers three practical advantages: positivity and exact normalization by construction, native multimodality through interference among basis modes without explicit mixture bookkeeping, and yields closed-form (or efficiently computable) functionals$-$such as moments and several calibration diagnostics$-$as quadratic forms in coefficient space. We develop the statistical and computational foundations of the SNN, including (i) training by exact maximum-likelihood with unit-sphere coefficient parameterization, (ii) physics-inspired quadratic regularizers (kinetic and potential energies) motivated by uncertainty relations between localization and spectral complexity, (iii) scalable low-rank and separable extensions for multivariate outputs, (iv) operator-based extensions that represent observables, constraints, and weak labels as self-adjoint matrices acting on the amplitude space, and (v) a comprehensive framework for evaluating multimodal predictions. The SNN provides a coherent, tractable framework to elevate probabilistic prediction from point estimates to physically inspired amplitude-based distributions.

artificial intelligence, conditional density, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2510.23449

Genre: Research Report (0.82)

Industry: Energy (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

Symbolic Neural Generation with Applications to Lead Discovery in Drug Design

Srinivasan, Ashwin, Baskar, A, Dash, Tirtharaj, Bain, Michael, Dey, Sanjay Kumar, Banerjee, Mainak

arXiv.org Artificial IntelligenceOct-28-2025

We investigate a relatively underexplored class of hybrid neurosymbolic models integrating symbolic learning with neural reasoning to construct data generators meeting formal correctness criteria. In \textit{Symbolic Neural Generators} (SNGs), symbolic learners examine logical specifications of feasible data from a small set of instances -- sometimes just one. Each specification in turn constrains the conditional information supplied to a neural-based generator, which rejects any instance violating the symbolic specification. Like other neurosymbolic approaches, SNG exploits the complementary strengths of symbolic and neural methods. The outcome of an SNG is a triple $(H, X, W)$, where $H$ is a symbolic description of feasible instances constructed from data, $X$ a set of generated new instances that satisfy the description, and $W$ an associated weight. We introduce a semantics for such systems, based on the construction of appropriate \textit{base} and \textit{fibre} partially-ordered sets combined into an overall partial order, and outline a probabilistic extension relevant to practical applications. In this extension, SNGs result from searching over a weighted partial ordering. We implement an SNG combining a restricted form of Inductive Logic Programming (ILP) with a large language model (LLM) and evaluate it on early-stage drug design. Our main interest is the description and the set of potential inhibitor molecules generated by the SNG. On benchmark problems -- where drug targets are well understood -- SNG performance is statistically comparable to state-of-the-art methods. On exploratory problems with poorly understood targets, generated molecules exhibit binding affinities on par with leading clinical candidates. Experts further find the symbolic specifications useful as preliminary filters, with several generated molecules identified as viable for synthesis and wet-lab testing.

large language model, logic & formal reasoning, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2510.23379

Country:

Asia > India (0.46)
Oceania > Australia (0.28)
North America > United States (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.45)
Materials > Metals & Mining > Lead (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)

Add feedback

Effortless, Simulation-Efficient Bayesian Inference using Tabular Foundation Models

Vetter, Julius, Gloeckler, Manuel, Gedon, Daniel, Macke, Jakob H.

arXiv.org Artificial IntelligenceOct-28-2025

Simulation-based inference (SBI) offers a flexible and general approach to performing Bayesian inference: In SBI, a neural network is trained on synthetic data simulated from a model and used to rapidly infer posterior distributions for observed data. A key goal for SBI is to achieve accurate inference with as few simulations as possible, especially for expensive simulators. In this work, we address this challenge by repurposing recent probabilistic foundation models for tabular data: We show how tabular foundation models -- specifically TabPFN -- can be used as pre-trained autoregressive conditional density estimators for SBI. We propose Neural Posterior Estimation with Prior-data Fitted Networks (NPE-PFN) and show that it is competitive with current SBI approaches in terms of accuracy for both benchmark tasks and two complex scientific inverse problems. Crucially, it often substantially outperforms them in terms of simulation efficiency, sometimes requiring orders of magnitude fewer simulations. NPE-PFN eliminates the need for inference network selection, training, and hyperparameter tuning. We also show that it exhibits superior robustness to model misspecification and can be scaled to simulation budgets that exceed the context size limit of TabPFN. NPE-PFN provides a new direction for SBI, where training-free, general-purpose inference models offer efficient, easy-to-use, and flexible solutions for a wide range of stochastic inverse problems.

machine learning, natural language, simulation, (17 more...)

arXiv.org Artificial Intelligence

2504.1766

Country: Europe > Germany (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Exploring Structures of Inferential Mechanisms through Simplistic Digital Circuits

Sileno, Giovanni, Dessalles, Jean-Louis

arXiv.org Artificial IntelligenceOct-28-2025

Cognitive studies and artificial intelligence have developed distinct models for various inferential mechanisms (categorization, induction, abduction, causal inference, contrast, merge, ...). Yet, both natural and artificial views on cognition lack apparently a unifying framework. This paper formulates a speculative answer attempting to respond to this gap. To postulate on higher-level activation processes from a material perspective, we consider inferential mechanisms informed by symbolic AI modelling techniques, through the simplistic lenses of electronic circuits based on logic gates. We observe that a logic gate view entails a different treatment of implication and negation compared to standard logic and logic programming. Then, by combinatorial exploration, we identify four main forms of dependencies that can be realized by these inferential circuits. Looking at how these forms are generally used in the context of logic programs, we identify eight common inferential patterns, exposing traditionally distinct inferential mechanisms in an unifying framework. Finally, following a probabilistic interpretation of logic programs, we unveil inner functional dependencies. The paper concludes elaborating in what sense, even if our arguments are mostly informed by symbolic means and digital systems infrastructures, our observations may pinpoint to more generally applicable structures.

logic & formal reasoning, machine learning, mechanism, (21 more...)

arXiv.org Artificial Intelligence

2510.22883

Country:

North America > United States (0.46)
Europe > Netherlands (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Variational Polya Tree

Xu, Lu, Chan, Tsai Hor, Lam, Kwok Fai, Yu, Lequan, Yin, Guosheng

arXiv.org Artificial IntelligenceOct-28-2025

Density estimation is essential for generative modeling, particularly with the rise of modern neural networks. While existing methods capture complex data distributions, they often lack interpretability and uncertainty quantification. Bayesian nonparametric methods, especially the \polya tree, offer a robust framework that addresses these issues by accurately capturing function behavior over small intervals. Traditional techniques like Markov chain Monte Carlo (MCMC) face high computational complexity and scalability limitations, hindering the use of Bayesian nonparametric methods in deep learning. To tackle this, we introduce the variational \polya tree (VPT) model, which employs stochastic variational inference to compute posterior distributions. This model provides a flexible, nonparametric Bayesian prior that captures latent densities and works well with stochastic gradient optimization. We also leverage the joint distribution likelihood for a more precise variational posterior approximation than traditional mean-field methods. We evaluate the model performance on both real data and images, and demonstrate its competitiveness with other state-of-the-art deep density estimation methods. We also explore its ability in enhancing interpretability and uncertainty quantification. Code is available at https://github.com/howardchanth/var-polya-tree.

artificial intelligence, machine learning, vpt, (15 more...)

arXiv.org Artificial Intelligence

2510.22651

Country:

North America (0.46)
Asia > China (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

A Framework for Quantifying How Pre-Training and Context Benefit In-Context Learning

Song, Bingqing, Li, Jiaxiang, Wang, Rong, Lu, Songtao, Hong, Mingyi

arXiv.org Artificial IntelligenceOct-28-2025

Pre-trained large language models have demonstrated a strong ability to learn from context, known as in-context learning (ICL). Despite a surge of recent applications that leverage such capabilities, it is by no means clear, at least theoretically, how the ICL capabilities arise, and in particular, what is the precise role played by key factors such as pre-training procedure as well as context construction. In this work, we propose a new framework to analyze the ICL performance, for a class of realistic settings, which includes network architectures, data encoding, data generation, and prompt construction process. As a first step, we construct a simple example with a one-layer transformer, and show an interesting result, namely when the pre-train data distribution is different from the query task distribution, a properly constructed context can shift the output distribution towards the query task distribution, in a quantifiable manner, leading to accurate prediction on the query topic. We then extend the findings in the previous step to a more general case, and derive the precise relationship between ICL performance, context length and the KL divergence between pre-train and query task distribution. Finally, we provide experiments to validate our theoretical results.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.22594

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
(2 more...)

Add feedback

Feasibility-Aware Decision-Focused Learning for Predicting Parameters in the Constraints

Mandi, Jayanta, Defresne, Marianne, Berden, Senne, Guns, Tias

arXiv.org Artificial IntelligenceOct-28-2025

When some parameters of a constrained optimization problem (COP) are uncertain, this gives rise to a predict-then-optimize (PtO) problem, comprising two stages: the prediction of the unknown parameters from contextual information and the subsequent optimization using those predicted parameters. Decision-focused learning (DFL) implements the first stage by training a machine learning (ML) model to optimize the quality of the decisions made using the predicted parameters. When the predicted parameters occur in the constraints, they can lead to infeasible solutions. Therefore, it is important to simultaneously manage both feasibility and decision quality. We develop a DFL framework for predicting constraint parameters in a generic COP. While prior works typically assume that the underlying optimization problem is a linear program (LP) or integer LP (ILP), our approach makes no such assumption. We derive two novel loss functions based on maximum likelihood estimation (MLE): the first one penalizes infeasibility (by penalizing predicted parameters that lead to infeasible solutions), while the second one penalizes suboptimal decisions (by penalizing predicted parameters that make the true optimal solution infeasible). We introduce a single tunable parameter to form a weighted average of the two losses, allowing decision-makers to balance suboptimality and feasibility. We experimentally demonstrate that adjusting this parameter provides decision-makers control over this trade-off. Moreover, across several COP instances, we show that adjusting the tunable parameter allows a decision-maker to prioritize either suboptimality or feasibility, outperforming the performance of existing baselines in either objective.

artificial intelligence, constraint, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.04951

Country: Europe > Belgium > Flanders (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Modeling Bottom-up Information Quality during Language Processing

Ding, Cui, Yin, Yanning, Jäger, Lena A., Wilcox, Ethan Gotlieb

arXiv.org Artificial IntelligenceOct-28-2025

Contemporary theories model language processing as integrating both top-down expectations and bottom-up inputs. One major prediction of such models is that the quality of the bottom-up inputs modulates ease of processing -- noisy inputs should lead to difficult and effortful comprehension. We test this prediction in the domain of reading. First, we propose an information-theoretic operationalization for the "quality" of bottom-up information as the mutual information (MI) between visual information and word identity. We formalize this prediction in a mathematical model of reading as a Bayesian update. Second, we test our operationalization by comparing participants' reading times in conditions where words' information quality has been reduced, either by occluding their top or bottom half, with full words. We collect data in English and Chinese. We then use multimodal language models to estimate the mutual information between visual inputs and words. We use these data to estimate the specific effect of reduced information quality on reading times. Finally, we compare how information is distributed across visual forms. In English and Chinese, the upper half contains more information about word identity than the lower half. However, the asymmetry is more pronounced in English, a pattern which is reflected in the reading times.

information, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.17047

Country:

Europe (0.68)
North America > United States (0.68)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)
(2 more...)

Add feedback

Provable test-time adaptivity and distributional robustness of in-context learning

Ma, Tianyi, Wang, Tengyao, Samworth, Richard J.

arXiv.org Machine LearningOct-28-2025

We study in-context learning problems where a Transformer is pretrained on tasks drawn from a mixture distribution $π=\sum_{α\in\mathcal{A}} λ_α π_α$, called the pretraining prior, in which each mixture component $π_α$ is a distribution on tasks of a specific difficulty level indexed by $α$. Our goal is to understand the performance of the pretrained Transformer when evaluated on a different test distribution $μ$, consisting of tasks of fixed difficulty $β\in\mathcal{A}$, and with potential distribution shift relative to $π_β$, subject to the chi-squared divergence $χ^2(μ,π_β)$ being at most $κ$. In particular, we consider nonparametric regression problems with random smoothness, and multi-index models with random smoothness as well as random effective dimension. We prove that a large Transformer pretrained on sufficient data achieves the optimal rate of convergence corresponding to the difficulty level $β$, uniformly over test distributions $μ$ in the chi-squared divergence ball. Thus, the pretrained Transformer is able to achieve faster rates of convergence on easier tasks and is robust to distribution shift at test time. Finally, we prove that even if an estimator had access to the test distribution $μ$, the convergence rate of its expected risk over $μ$ could not be faster than that of our pretrained Transformers, thereby providing a more appropriate optimality guarantee than minimax lower bounds.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2510.23254

Country: Europe (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback