AITopics | Bayesian Learning

Collaborating Authors

Bayesian Learning

A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Foundations of Unknown-aware Machine Learning

Du, Xuefeng

arXiv.org Artificial IntelligenceMay-22-2025

Ensuring the reliability and safety of machine learning models in open-world deployment is a central challenge in AI safety. This thesis develops both algorithmic and theoretical foundations to address key reliability issues arising from distributional uncertainty and unknown classes, from standard neural networks to modern foundation models like large language models (LLMs). Traditional learning paradigms, such as empirical risk minimization (ERM), assume no distribution shift between training and inference, often leading to overconfident predictions on out-of-distribution (OOD) inputs. This thesis introduces novel frameworks that jointly optimize for in-distribution accuracy and reliability to unseen data. A core contribution is the development of an unknown-aware learning framework that enables models to recognize and handle novel inputs without labeled OOD data. We propose new outlier synthesis methods, VOS, NPOS, and DREAM-OOD, to generate informative unknowns during training. Building on this, we present SAL, a theoretical and algorithmic framework that leverages unlabeled in-the-wild data to enhance OOD detection under realistic deployment conditions. These methods demonstrate that abundant unlabeled data can be harnessed to recognize and adapt to unforeseen inputs, providing formal reliability guarantees. The thesis also extends reliable learning to foundation models. We develop HaloScope for hallucination detection in LLMs, MLLMGuard for defending against malicious prompts in multimodal models, and data cleaning methods to denoise human feedback used for better alignment. These tools target failure modes that threaten the safety of large-scale models in deployment. Overall, these contributions promote unknown-aware learning as a new paradigm, and we hope it can advance the reliability of AI systems with minimal human efforts.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2505.14933

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Education (1.00)
Transportation > Ground > Road (0.67)
Information Technology > Security & Privacy (0.45)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(3 more...)

Add feedback

Scalable Bayesian Monte Carlo: fast uncertainty estimation beyond deep ensembles

Liang, Xinzhu, Lukens, Joseph M., Lohani, Sanjaya, Kirby, Brian T., Searles, Thomas A., Qiu, Xin, Law, Kody J. H.

arXiv.org Machine LearningMay-21-2025

This work introduces a new method called scalable Bayesian Monte Carlo (SBMC). The model interpolates between a point estimator and the posterior, and the algorithm is a parallel implementation of a consistent (asymptotically unbiased) Bayesian deep learning algorithm: sequential Monte Carlo (SMC) or Markov chain Monte Carlo (MCMC). The method is motivated theoretically, and its utility is demonstrated on practical examples: MNIST, CIFAR, IMDb. A systematic numerical study reveals that parallel implementations of SMC and MCMC are comparable to serial implementations in terms of performance and total cost, and they achieve accuracy at or beyond the state-of-the-art (SOTA) methods like deep ensembles at convergence, along with substantially improved uncertainty quantification (UQ)--in particular, epistemic UQ. But even parallel implementations are expensive, with an irreducible time barrier much larger than the cost of the MAP estimator. Compressing time further leads to rapid degradation of accuracy, whereas UQ remains valuable. By anchoring to a point estimator we can recover accuracy, while retaining valuable UQ, ultimately delivering strong performance across metrics for a cost comparable to the SOTA.

artificial intelligence, machine learning, smc, (20 more...)

arXiv.org Machine Learning

2505.13585

Genre:

Research Report > New Finding (0.45)
Research Report > Experimental Study (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Uncertainty Quantification for Prior-Data Fitted Networks using Martingale Posteriors

Nagler, Thomas, Rügamer, David

arXiv.org Machine LearningMay-21-2025

Prior-data fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular data sets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not provide any uncertainty quantification for predictive means, quantiles, or similar quantities. We propose a principled and efficient sampling procedure to construct Bayesian posteriors for such estimates based on Martingale posteriors, and prove its convergence. Several simulated and real-world data examples showcase the uncertainty quantification of our method in inference applications.

large language model, machine learning, posterior, (17 more...)

arXiv.org Machine Learning

2505.11325

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

A simple estimator of the correlation kernel matrix of a determinantal point process

Gouriéroux, Christian, Lu, Yang

arXiv.org Machine LearningMay-21-2025

Determinantal Point Process (DPP) is a flexible family of distributions for random sets defined on the finite state space { 1, ...,d }, or equivalently for multivariate binary variables. This family is parameterized by either the L-ensemble kernel Σ, which is symmetric positive definite (SPD), or the correlation kernel matrix K, which is SPD, with eigenvalues lying strictly between 0 and 1. The literature has considered the maximum likelihood estimation (MLE) of Σ and K or its algorithmic analogues (Affandi et al., 2014; Brunel et al., 2017a,b), but it has since been shown that i) the likelihood function has at least 2

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

2505.14529

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Pennsylvania (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.55)

Add feedback

A system identification approach to clustering vector autoregressive time series

Yue, Zuogong, Wang, Xinyi, Solo, Victor

arXiv.org Machine LearningMay-21-2025

Clustering of time series based on their underlying dynamics is keeping attracting researchers due to its impacts on assisting complex system modelling. Most current time series clustering methods handle only scalar time series, treat them as white noise, or rely on domain knowledge for high-quality feature construction, where the autocorrelation pattern/feature is mostly ignored. Instead of relying on heuristic feature/metric construction, the system identification approach allows treating vector time series clustering by explicitly considering their underlying autoregressive dynamics. We first derive a clustering algorithm based on a mixture autoregressive model. Unfortunately it turns out to have significant computational problems. We then derive a `small-noise' limiting version of the algorithm, which we call k-LMVAR (Limiting Mixture Vector AutoRegression), that is computationally manageable. We develop an associated BIC criterion for choosing the number of clusters and model order. The algorithm performs very well in comparative simulations and also scales well computationally.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2505.14421

Country:

Asia > China (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California (0.04)
Europe > Monaco (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Local Minima Prediction using Dynamic Bayesian Filtering for UGV Navigation in Unstructured Environments

Lee, Seung Hun, Jo, Wonse, Robert, Lionel P. Jr., Tilbury, Dawn M.

arXiv.org Artificial IntelligenceMay-21-2025

Path planning is crucial for the navigation of autonomous vehicles, yet these vehicles face challenges in complex and real-world environments. Although a global view may be provided, it is often outdated, necessitating the reliance of Unmanned Ground Vehicles (UGVs) on real-time local information. This reliance on partial information, without considering the global context, can lead to UGVs getting stuck in local minima. This paper develops a method to proactively predict local minima using Dynamic Bayesian filtering, based on the detected obstacles in the local view and the global goal. This approach aims to enhance the autonomous navigation of self-driving vehicles by allowing them to predict potential pitfalls before they get stuck, and either ask for help from a human, or re-plan an alternate trajectory.

artificial intelligence, machine learning, obstacle, (16 more...)

arXiv.org Artificial Intelligence

2505.14337

Country: North America > United States > Michigan (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.62)

Add feedback

Understanding Task Representations in Neural Networks via Bayesian Ablation

Nam, Andrew, Campbell, Declan, Griffiths, Thomas, Cohen, Jonathan, Leslie, Sarah-Jane

arXiv.org Artificial IntelligenceMay-21-2025

Neural networks are powerful tools for cognitive modeling due to their flexibility and emergent properties. However, interpreting their learned representations remains challenging due to their sub-symbolic semantics. In this work, we introduce a novel probabilistic framework for interpreting latent task representations in neural networks. Inspired by Bayesian inference, our approach defines a distribution over representational units to infer their causal contributions to task performance. Using ideas from information theory, we propose a suite of tools and metrics to illuminate key model properties, including representational distributedness, manifold complexity, and polysemanticity.

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Artificial Intelligence

2505.13742

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

The Gaussian Latent Machine: Efficient Prior and Posterior Sampling for Inverse Problems

Kuric, Muhamed, Zach, Martin, Habring, Andreas, Unser, Michael, Pock, Thomas

arXiv.org Machine LearningMay-20-2025

We consider the problem of sampling from a product-of-experts-type model that encompasses many standard prior and posterior distributions commonly found in Bayesian imaging. We show that this model can be easily lifted into a novel latent variable model, which we refer to as a Gaussian latent machine. This leads to a general sampling approach that unifies and generalizes many existing sampling algorithms in the literature. Most notably, it yields a highly efficient and effective two-block Gibbs sampling approach in the general case, while also specializing to direct sampling algorithms in particular cases. Finally, we present detailed numerical experiments that demonstrate the efficiency and effectiveness of our proposed sampling approach across a wide range of prior and posterior sampling problems from Bayesian imaging.

artificial intelligence, experiment, machine learning, (20 more...)

arXiv.org Machine Learning

2505.12836

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)

Add feedback

Testing Identifiability and Transportability with Observational and Experimental Data

Lelova, Konstantina, Cooper, Gregory F., Triantafillou, Sofia

arXiv.org Machine LearningMay-20-2025

Transporting causal information learned from experiments in one population to another is a critical challenge in clinical research and decision-making. Causal transportability uses causal graphs to model differences between the source and target populations and identifies conditions under which causal effects learned from experiments can be reused in a different population. Similarly, causal identifiability identifies conditions under which causal effects can be estimated from observational data. However, these approaches rely on knowing the causal graph, which is often unavailable in real-world settings. In this work, we propose a Bayesian method for assessing whether Z-specific (conditional) causal effects are both identifiable and transportable, without knowing the causal graph. Our method combines experimental data from the source population with observational data from the target population to compute the probability that a causal effect is both identifiable from observational data and transportable. When this holds, we leverage both observational data from the target domain and experimental data from the source domain to obtain an unbiased, efficient estimator of the causal effect in the target population. Using simulations, we demonstrate that our method correctly identifies transportable causal effects and improves causal effect estimation.

artificial intelligence, backdoor, machine learning, (12 more...)

arXiv.org Machine Learning

2505.12801

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Greece (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.88)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Rapidly Varying Completely Random Measures for Modeling Extremely Sparse Networks

Kilian, Valentin, Guedj, Benjamin, Caron, François

arXiv.org Machine LearningMay-20-2025

Completely random measures (CRMs) are fundamental to Bayesian nonparametric models, with applications in clustering, feature allocation, and network analysis. A key quantity of interest is the Laplace exponent, whose asymptotic behavior determines how the random structures scale. When the Laplace exponent grows nearly linearly - known as rapid variation - the induced models exhibit approximately linear growth in the number of clusters, features, or edges with sample size or network nodes. This regime is especially relevant for modeling sparse networks, yet existing CRM constructions lack tractability under rapid variation. We address this by introducing a new class of CRMs with index of variation $α\in(0,1]$, defined as mixtures of stable or generalized gamma processes. These models offer interpretable parameters, include well-known CRMs as limiting cases, and retain analytical tractability through a tractable Laplace exponent and simple size-biased representation. We analyze the asymptotic properties of this CRM class and apply it to the Caron-Fox framework for sparse graphs. The resulting models produce networks with near-linear edge growth, aligning with empirical evidence from large-scale networks. Additionally, we present efficient algorithms for simulation and posterior inference, demonstrating practical advantages through experiments on real-world sparse network datasets.

artificial intelligence, machine learning, social media, (19 more...)

arXiv.org Machine Learning

2505.13206

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Genre: Research Report (0.40)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Social Media (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback