AITopics | Bayesian Learning

Collaborating Authors

Bayesian Learning

A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Last Layer Hamiltonian Monte Carlo

Vellenga, Koen, Steinhauer, H. Joe, Falkman, Göran, Andersson, Jonas, Sjögren, Anders

arXiv.org Artificial IntelligenceJul-15-2025

We explore the use of Hamiltonian Monte Carlo (HMC) sampling as a probabilistic last layer approach for deep neural networks (DNNs). While HMC is widely regarded as a gold standard for uncertainty estimation, the computational demands limit its application to large-scale datasets and large DNN architectures. Although the predictions from the sampled DNN parameters can be parallelized, the computational cost still scales linearly with the number of samples (similar to an ensemble). Last layer HMC (LL--HMC) reduces the required computations by restricting the HMC sampling to the final layer of a DNN, making it applicable to more data-intensive scenarios with limited computational resources. In this paper, we compare LL-HMC against five last layer probabilistic deep learning (LL-PDL) methods across three real-world video datasets for driver action and intention. We evaluate the in-distribution classification performance, calibration, and out-of-distribution (OOD) detection. Due to the stochastic nature of the probabilistic evaluations, we performed five grid searches for different random seeds to avoid being reliant on a single initialization for the hyperparameter configurations. The results show that LL--HMC achieves competitive in-distribution classification and OOD detection performance. Additional sampled last layer parameters do not improve the classification performance, but can improve the OOD detection. Multiple chains or starting positions did not yield consistent improvements.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.08905

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology (0.67)
Automobiles & Trucks (0.46)
Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
(2 more...)

Add feedback

CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk

Azizi, Ilia, Bodik, Juraj, Heiss, Jakob, Yu, Bin

arXiv.org Machine LearningJul-14-2025

Accurate uncertainty quantification is critical for reliable predictive modeling, especially in regression tasks. Existing methods typically address either aleatoric uncertainty from measurement noise or epistemic uncertainty from limited data, but not necessarily both in a balanced way. We propose CLEAR, a calibration method with two distinct parameters, $γ_1$ and $γ_2$, to combine the two uncertainty components for improved conditional coverage. CLEAR is compatible with any pair of aleatoric and epistemic estimators; we show how it can be used with (i) quantile regression for aleatoric uncertainty and (ii) ensembles drawn from the Predictability-Computability-Stability (PCS) framework for epistemic uncertainty. Across 17 diverse real-world datasets, CLEAR achieves an average improvement of 28.2% and 17.4% in the interval width compared to the two individually calibrated baselines while maintaining nominal coverage. This improvement can be particularly evident in scenarios dominated by either high epistemic or high aleatoric uncertainty.

aleatoric uncertainty, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2507.0815

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
North America > United States > Iowa > Story County > Ames (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Government (0.67)
Banking & Finance > Real Estate (0.67)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Galerkin-ARIMA: A Two-Stage Polynomial Regression Framework for Fast Rolling One-Step-Ahead Forecasting

Liu, Haojie, Lin, Zihan

arXiv.org Machine LearningJul-14-2025

We introduce Galerkin-ARIMA, a novel time-series forecasting framework that integrates Galerkin projection techniques with the classical ARIMA model to capture potentially nonlinear dependencies in lagged observations. By replacing the fixed linear autoregressive component with a spline-based basis expansion, Galerkin-ARIMA flexibly approximates the underlying relationship among past values via ordinary least squares, while retaining the moving-average structure and Gaussian innovation assumptions of ARIMA. We derive closed-form solutions for both the AR and MA components using two-stage Galerkin projections, establish conditions for asymptotic unbiasedness and consistency, and analyze the bias-variance trade-off under basis-size growth. Complexity analysis reveals that, for moderate basis dimensions, our approach can substantially reduce computational cost compared to maximum-likelihood ARIMA estimation. Through extensive simulations on four synthetic processes-including noisy ARMA, seasonal, trend-AR, and nonlinear recursion series-we demonstrate that Galerkin-ARIMA matches or closely approximates ARIMA's forecasting accuracy while achieving orders-of-magnitude speedups in rolling forecasting tasks. These results suggest that Galerkin-ARIMA offers a powerful, efficient alternative for modeling complex time series dynamics in high-volume or real-time applications.

artificial intelligence, machine learning, modeling & simulation, (17 more...)

arXiv.org Machine Learning

2507.07469

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (1.00)
North America > United States > California > Riverside County > Riverside (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Modeling & Simulation (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Add feedback

Mallows Model with Learned Distance Metrics: Sampling and Maximum Likelihood Estimation

Alimohammadi, Yeganeh, Asgari, Kiana

arXiv.org Machine LearningJul-14-2025

\textit{Mallows model} is a widely-used probabilistic framework for learning from ranking data, with applications ranging from recommendation systems and voting to aligning language models with human preferences~\cite{chen2024mallows, kleinberg2021algorithmic, rafailov2024direct}. Under this model, observed rankings are noisy perturbations of a central ranking $σ$, with likelihood decaying exponentially in distance from $σ$, i.e, $P (π) \propto \exp\big(-β\cdot d(π, σ)\big),$ where $β> 0$ controls dispersion and $d$ is a distance function. Existing methods mainly focus on fixed distances (such as Kendall's $τ$ distance), with no principled approach to learning the distance metric directly from data. In practice, however, rankings naturally vary by context; for instance, in some sports we regularly see long-range swaps (a low-rank team beating a high-rank one), while in others such events are rare. Motivated by this, we propose a generalization of Mallows model that learns the distance metric directly from data. Specifically, we focus on $L_α$ distances: $d_α(π,σ):=\sum_{i=1} |π(i)-σ(i)|^α$. For any $α\geq 1$ and $β>0$, we develop a Fully Polynomial-Time Approximation Scheme (FPTAS) to efficiently generate samples that are $ε$- close (in total variation distance) to the true distribution. Even in the special cases of $L_1$ and $L_2$, this generalizes prior results that required vanishing dispersion ($β\to0$). Using this sampling algorithm, we propose an efficient Maximum Likelihood Estimation (MLE) algorithm that jointly estimates the central ranking, the dispersion parameter, and the optimal distance metric. We prove strong consistency results for our estimators (for any values of $α$ and $β$), and we validate our approach empirically using datasets from sports rankings.

artificial intelligence, machine learning, permutation, (18 more...)

arXiv.org Machine Learning

2507.08108

Country:

North America > United States > Michigan (0.04)
North America > United States > Oklahoma (0.04)
North America > United States > Ohio (0.04)
(9 more...)

Genre:

Workflow (0.93)
Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports > Basketball (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Efficient Causal Discovery for Autoregressive Time Series

Fesanghary, Mohammad, Gopal, Achintya

arXiv.org Artificial IntelligenceJul-11-2025

Causal structure learning (CSL) in time series refers to the process of identifying and quantifying potentially time-lagged causal relationships among variables in a system. Unlike traditional time series analysis, which often focuses on prediction and correlation, CSL aims to uncover the cause-and-effect relationships that underlie the observed data. CSL is a crucial challenge in numerous fields such as economics, finance, healthcare, and natural science, where understanding the causal mechanisms can lead to more accurate forecasting, targeted interventions, and improved risk management. Causal structure learning poses significant challenges due to the presence of unobserved confounding factors, limited observational data, non-stationarity, and noise. Traditional CSL methods, which primarily focus on contemporaneous data, address some of these issues, but encounter considerable difficulties when extended to time series data.

artificial intelligence, graph, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2507.07898

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Bayesian Double Descent

Polson, Nick, Sokolov, Vadim

arXiv.org Machine LearningJul-11-2025

Double descent is a phenomenon of over-parameterized statistical models. Our goal is to view double descent from a Bayesian perspective. Over-parameterized models such as deep neural networks have an interesting re-descending property in their risk characteristics. This is a recent phenomenon in machine learning and has been the subject of many studies. As the complexity of the model increases, there is a U-shaped region corresponding to the traditional bias-variance trade-off, but then as the number of parameters equals the number of observations and the model becomes one of interpolation, the risk can become infinite and then, in the over-parameterized region, it re-descends -- the double descent effect. We show that this has a natural Bayesian interpretation. Moreover, we show that it is not in conflict with the traditional Occam's razor that Bayesian models possess, in that they tend to prefer simpler models when possible. We illustrate the approach with an example of Bayesian model selection in neural networks. Finally, we conclude with directions for future research.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

2507.07338

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Indiana > Hamilton County > Fishers (0.04)
North America > United States > California (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Goal-Oriented Sequential Bayesian Experimental Design for Causal Learning

Zhang, Zheyu, Dong, Jiayuan, Liu, Jie, Huan, Xun

arXiv.org Machine LearningJul-11-2025

We present GO-CBED, a goal-oriented Bayesian framework for sequential causal experimental design. Unlike conventional approaches that select interventions aimed at inferring the full causal model, GO-CBED directly maximizes the expected information gain (EIG) on user-specified causal quantities of interest, enabling more targeted and efficient experimentation. The framework is both non-myopic, optimizing over entire intervention sequences, and goal-oriented, targeting only model aspects relevant to the causal query. To address the intractability of exact EIG computation, we introduce a variational lower bound estimator, optimized jointly through a transformer-based policy network and normalizing flow-based variational posteriors. The resulting policy enables real-time decision-making via an amortized network. We demonstrate that GO-CBED consistently outperforms existing baselines across various causal reasoning and discovery tasks-including synthetic structural causal models and semi-synthetic gene regulatory networks-particularly in settings with limited experimental budgets and complex causal mechanisms. Our results highlight the benefits of aligning experimental design objectives with specific research goals and of forward-looking sequential planning.

artificial intelligence, bayesian inference, machine learning, (12 more...)

arXiv.org Machine Learning

2507.07359

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Comparative sentiment analysis of public perception: Monkeypox vs. COVID-19 behavioral insights

Faisal, Mostafa Mohaimen Akand, Jhuma, Rabeya Amin, Jasim, Jamini

arXiv.org Artificial IntelligenceJul-11-2025

The emergence of global health crises, such as COVID-19 and Monkeypox (mpox), has underscored the importance of understanding public sentiment to inform effective public health strategies. This study conducts a comparative sentiment analysis of public perceptions surrounding COVID-19 and mpox by leveraging extensive datasets of 147,475 and 106,638 tweets, respectively. Advanced machine learning models, including Logistic Regression, Naive Bayes, RoBERTa, DistilRoBERTa and XLNet, were applied to perform sentiment classification, with results indicating key trends in public emotion and discourse. The analysis highlights significant differences in public sentiment driven by disease characteristics, media representation, and pandemic fatigue. Through the lens of sentiment polarity and thematic trends, this study offers valuable insights into tailoring public health messaging, mitigating misinformation, and fostering trust during concurrent health crises. The findings contribute to advancing sentiment analysis applications in public health informatics, setting the groundwork for enhanced real-time monitoring and multilingual analysis in future research.

machine learning, natural language, sentiment, (17 more...)

arXiv.org Artificial Intelligence

2505.0743

Country:

North America > United States (1.00)
Europe (0.67)

Genre: Research Report > New Finding (0.89)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Bayesian Invariance Modeling of Multi-Environment Data

Wu, Luhuan, Yin, Mingzhang, Wang, Yixin, Cunningham, John P., Blei, David M.

arXiv.org Machine LearningJul-10-2025

Invariant prediction [Peters et al., 2016] analyzes feature/outcome data from multiple environments to identify invariant features - those with a stable predictive relationship to the outcome. Such features support generalization to new environments and help reveal causal mechanisms. Previous methods have primarily tackled this problem through hypothesis testing or regularized optimization. Here we develop Bayesian Invariant Prediction (BIP), a probabilistic model for invariant prediction. BIP encodes the indices of invariant features as a latent variable and recover them by posterior inference. Under the assumptions of Peters et al. [2016], the BIP posterior targets the true invariant features. We prove that the posterior is consistent and that greater environment heterogeneity leads to faster posterior contraction. To handle many features, we design an efficient variational approximation called VI-BIP. In simulations and real data, we find that BIP and VI-BIP are more accurate and scalable than existing methods for invariant prediction.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

2506.22675

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

stCEG: An R Package for Modelling Events over Spatial Areas Using Chain Event Graphs

Calley, Hollie, Williamson, Daniel

arXiv.org Machine LearningJul-10-2025

stCEG is an R package which allows a user to fully specify a Chain Event Graph (CEG) model from data and to produce interactive plots. It includes functions for the user to visualise spatial variables they wish to include in the model. There is also a web-based graphical user interface (GUI) provided, increasing ease of use for those without knowledge of R. We demonstrate stCEG using a dataset of homicides in London, which is included in the package. stCEG is the first software package for CEGs that allows for full model customisation.

artificial intelligence, ceg, machine learning, (16 more...)

arXiv.org Machine Learning

2507.06726

Country: Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.50)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)

Add feedback