Bayesian Learning
Last Layer Hamiltonian Monte Carlo
Vellenga, Koen, Steinhauer, H. Joe, Falkman, Gรถran, Andersson, Jonas, Sjรถgren, Anders
We explore the use of Hamiltonian Monte Carlo (HMC) sampling as a probabilistic last layer approach for deep neural networks (DNNs). While HMC is widely regarded as a gold standard for uncertainty estimation, the computational demands limit its application to large-scale datasets and large DNN architectures. Although the predictions from the sampled DNN parameters can be parallelized, the computational cost still scales linearly with the number of samples (similar to an ensemble). Last layer HMC (LL--HMC) reduces the required computations by restricting the HMC sampling to the final layer of a DNN, making it applicable to more data-intensive scenarios with limited computational resources. In this paper, we compare LL-HMC against five last layer probabilistic deep learning (LL-PDL) methods across three real-world video datasets for driver action and intention. We evaluate the in-distribution classification performance, calibration, and out-of-distribution (OOD) detection. Due to the stochastic nature of the probabilistic evaluations, we performed five grid searches for different random seeds to avoid being reliant on a single initialization for the hyperparameter configurations. The results show that LL--HMC achieves competitive in-distribution classification and OOD detection performance. Additional sampled last layer parameters do not improve the classification performance, but can improve the OOD detection. Multiple chains or starting positions did not yield consistent improvements.
CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk
Azizi, Ilia, Bodik, Juraj, Heiss, Jakob, Yu, Bin
Accurate uncertainty quantification is critical for reliable predictive modeling, especially in regression tasks. Existing methods typically address either aleatoric uncertainty from measurement noise or epistemic uncertainty from limited data, but not necessarily both in a balanced way. We propose CLEAR, a calibration method with two distinct parameters, $ฮณ_1$ and $ฮณ_2$, to combine the two uncertainty components for improved conditional coverage. CLEAR is compatible with any pair of aleatoric and epistemic estimators; we show how it can be used with (i) quantile regression for aleatoric uncertainty and (ii) ensembles drawn from the Predictability-Computability-Stability (PCS) framework for epistemic uncertainty. Across 17 diverse real-world datasets, CLEAR achieves an average improvement of 28.2% and 17.4% in the interval width compared to the two individually calibrated baselines while maintaining nominal coverage. This improvement can be particularly evident in scenarios dominated by either high epistemic or high aleatoric uncertainty.
Galerkin-ARIMA: A Two-Stage Polynomial Regression Framework for Fast Rolling One-Step-Ahead Forecasting
We introduce Galerkin-ARIMA, a novel time-series forecasting framework that integrates Galerkin projection techniques with the classical ARIMA model to capture potentially nonlinear dependencies in lagged observations. By replacing the fixed linear autoregressive component with a spline-based basis expansion, Galerkin-ARIMA flexibly approximates the underlying relationship among past values via ordinary least squares, while retaining the moving-average structure and Gaussian innovation assumptions of ARIMA. We derive closed-form solutions for both the AR and MA components using two-stage Galerkin projections, establish conditions for asymptotic unbiasedness and consistency, and analyze the bias-variance trade-off under basis-size growth. Complexity analysis reveals that, for moderate basis dimensions, our approach can substantially reduce computational cost compared to maximum-likelihood ARIMA estimation. Through extensive simulations on four synthetic processes-including noisy ARMA, seasonal, trend-AR, and nonlinear recursion series-we demonstrate that Galerkin-ARIMA matches or closely approximates ARIMA's forecasting accuracy while achieving orders-of-magnitude speedups in rolling forecasting tasks. These results suggest that Galerkin-ARIMA offers a powerful, efficient alternative for modeling complex time series dynamics in high-volume or real-time applications.
Mallows Model with Learned Distance Metrics: Sampling and Maximum Likelihood Estimation
Alimohammadi, Yeganeh, Asgari, Kiana
\textit{Mallows model} is a widely-used probabilistic framework for learning from ranking data, with applications ranging from recommendation systems and voting to aligning language models with human preferences~\cite{chen2024mallows, kleinberg2021algorithmic, rafailov2024direct}. Under this model, observed rankings are noisy perturbations of a central ranking $ฯ$, with likelihood decaying exponentially in distance from $ฯ$, i.e, $P (ฯ) \propto \exp\big(-ฮฒ\cdot d(ฯ, ฯ)\big),$ where $ฮฒ> 0$ controls dispersion and $d$ is a distance function. Existing methods mainly focus on fixed distances (such as Kendall's $ฯ$ distance), with no principled approach to learning the distance metric directly from data. In practice, however, rankings naturally vary by context; for instance, in some sports we regularly see long-range swaps (a low-rank team beating a high-rank one), while in others such events are rare. Motivated by this, we propose a generalization of Mallows model that learns the distance metric directly from data. Specifically, we focus on $L_ฮฑ$ distances: $d_ฮฑ(ฯ,ฯ):=\sum_{i=1} |ฯ(i)-ฯ(i)|^ฮฑ$. For any $ฮฑ\geq 1$ and $ฮฒ>0$, we develop a Fully Polynomial-Time Approximation Scheme (FPTAS) to efficiently generate samples that are $ฮต$- close (in total variation distance) to the true distribution. Even in the special cases of $L_1$ and $L_2$, this generalizes prior results that required vanishing dispersion ($ฮฒ\to0$). Using this sampling algorithm, we propose an efficient Maximum Likelihood Estimation (MLE) algorithm that jointly estimates the central ranking, the dispersion parameter, and the optimal distance metric. We prove strong consistency results for our estimators (for any values of $ฮฑ$ and $ฮฒ$), and we validate our approach empirically using datasets from sports rankings.
Efficient Causal Discovery for Autoregressive Time Series
Fesanghary, Mohammad, Gopal, Achintya
Causal structure learning (CSL) in time series refers to the process of identifying and quantifying potentially time-lagged causal relationships among variables in a system. Unlike traditional time series analysis, which often focuses on prediction and correlation, CSL aims to uncover the cause-and-effect relationships that underlie the observed data. CSL is a crucial challenge in numerous fields such as economics, finance, healthcare, and natural science, where understanding the causal mechanisms can lead to more accurate forecasting, targeted interventions, and improved risk management. Causal structure learning poses significant challenges due to the presence of unobserved confounding factors, limited observational data, non-stationarity, and noise. Traditional CSL methods, which primarily focus on contemporaneous data, address some of these issues, but encounter considerable difficulties when extended to time series data.
Bayesian Double Descent
Double descent is a phenomenon of over-parameterized statistical models. Our goal is to view double descent from a Bayesian perspective. Over-parameterized models such as deep neural networks have an interesting re-descending property in their risk characteristics. This is a recent phenomenon in machine learning and has been the subject of many studies. As the complexity of the model increases, there is a U-shaped region corresponding to the traditional bias-variance trade-off, but then as the number of parameters equals the number of observations and the model becomes one of interpolation, the risk can become infinite and then, in the over-parameterized region, it re-descends -- the double descent effect. We show that this has a natural Bayesian interpretation. Moreover, we show that it is not in conflict with the traditional Occam's razor that Bayesian models possess, in that they tend to prefer simpler models when possible. We illustrate the approach with an example of Bayesian model selection in neural networks. Finally, we conclude with directions for future research.
Goal-Oriented Sequential Bayesian Experimental Design for Causal Learning
Zhang, Zheyu, Dong, Jiayuan, Liu, Jie, Huan, Xun
We present GO-CBED, a goal-oriented Bayesian framework for sequential causal experimental design. Unlike conventional approaches that select interventions aimed at inferring the full causal model, GO-CBED directly maximizes the expected information gain (EIG) on user-specified causal quantities of interest, enabling more targeted and efficient experimentation. The framework is both non-myopic, optimizing over entire intervention sequences, and goal-oriented, targeting only model aspects relevant to the causal query. To address the intractability of exact EIG computation, we introduce a variational lower bound estimator, optimized jointly through a transformer-based policy network and normalizing flow-based variational posteriors. The resulting policy enables real-time decision-making via an amortized network. We demonstrate that GO-CBED consistently outperforms existing baselines across various causal reasoning and discovery tasks-including synthetic structural causal models and semi-synthetic gene regulatory networks-particularly in settings with limited experimental budgets and complex causal mechanisms. Our results highlight the benefits of aligning experimental design objectives with specific research goals and of forward-looking sequential planning.
Comparative sentiment analysis of public perception: Monkeypox vs. COVID-19 behavioral insights
Faisal, Mostafa Mohaimen Akand, Jhuma, Rabeya Amin, Jasim, Jamini
The emergence of global health crises, such as COVID-19 and Monkeypox (mpox), has underscored the importance of understanding public sentiment to inform effective public health strategies. This study conducts a comparative sentiment analysis of public perceptions surrounding COVID-19 and mpox by leveraging extensive datasets of 147,475 and 106,638 tweets, respectively. Advanced machine learning models, including Logistic Regression, Naive Bayes, RoBERTa, DistilRoBERTa and XLNet, were applied to perform sentiment classification, with results indicating key trends in public emotion and discourse. The analysis highlights significant differences in public sentiment driven by disease characteristics, media representation, and pandemic fatigue. Through the lens of sentiment polarity and thematic trends, this study offers valuable insights into tailoring public health messaging, mitigating misinformation, and fostering trust during concurrent health crises. The findings contribute to advancing sentiment analysis applications in public health informatics, setting the groundwork for enhanced real-time monitoring and multilingual analysis in future research.
Bayesian Invariance Modeling of Multi-Environment Data
Wu, Luhuan, Yin, Mingzhang, Wang, Yixin, Cunningham, John P., Blei, David M.
Invariant prediction [Peters et al., 2016] analyzes feature/outcome data from multiple environments to identify invariant features - those with a stable predictive relationship to the outcome. Such features support generalization to new environments and help reveal causal mechanisms. Previous methods have primarily tackled this problem through hypothesis testing or regularized optimization. Here we develop Bayesian Invariant Prediction (BIP), a probabilistic model for invariant prediction. BIP encodes the indices of invariant features as a latent variable and recover them by posterior inference. Under the assumptions of Peters et al. [2016], the BIP posterior targets the true invariant features. We prove that the posterior is consistent and that greater environment heterogeneity leads to faster posterior contraction. To handle many features, we design an efficient variational approximation called VI-BIP. In simulations and real data, we find that BIP and VI-BIP are more accurate and scalable than existing methods for invariant prediction.
stCEG: An R Package for Modelling Events over Spatial Areas Using Chain Event Graphs
Calley, Hollie, Williamson, Daniel
stCEG is an R package which allows a user to fully specify a Chain Event Graph (CEG) model from data and to produce interactive plots. It includes functions for the user to visualise spatial variables they wish to include in the model. There is also a web-based graphical user interface (GUI) provided, increasing ease of use for those without knowledge of R. We demonstrate stCEG using a dataset of homicides in London, which is included in the package. stCEG is the first software package for CEGs that allows for full model customisation.