Learning Graphical Models
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers
Artificial intelligence (AI) is moving increasingly beyond prediction to support decisions in complex, uncertain, and dynamic environments. This shift creates a natural intersection with operations research and management sciences (OR/MS), which have long offered conceptual and methodological foundations for sequential decision-making under uncertainty. At the same time, recent advances in deep learning, including feedforward neural networks, LSTMs, transformers, and deep reinforcement learning, have expanded the scope of data-driven modeling and opened new possibilities for large-scale decision systems. This tutorial presents an OR/MS-centered perspective on deep learning for sequential decision-making under uncertainty. Its central premise is that deep learning is valuable not as a replacement for optimization, but as a complement to it. Deep learning brings adaptability and scalable approximation, whereas OR/MS provides the structural rigor needed to represent constraints, recourse, and uncertainty. The tutorial reviews key decision-making foundations, connects them to the major neural architectures in modern AI, and discusses leading approaches to integrating learning and optimization. It also highlights emerging impact in domains such as supply chains, healthcare and epidemic response, agriculture, energy, and autonomous operations. More broadly, it frames these developments as part of a wider transition from predictive AI toward decision-capable AI and highlights the role of OR/MS in shaping the next generation of integrated learning--optimization systems.
A Deep Generative Approach to Stratified Learning
Martinez, Randy, Tang, Rong, Lin, Lizhen
While the manifold hypothesis is widely adopted in modern machine learning, complex data is often better modeled as stratified spaces -- unions of manifolds (strata) of varying dimensions. Stratified learning is challenging due to varying dimensionality, intersection singularities, and lack of efficient models in learning the underlying distributions. We provide a deep generative approach to stratified learning by developing two generative frameworks for learning distributions on stratified spaces. The first is a sieve maximum likelihood approach realized via a dimension-aware mixture of variational autoencoders. The second is a diffusion-based framework that explores the score field structure of a mixture. We establish the convergence rates for learning both the ambient and intrinsic distributions, which are shown to be dependent on the intrinsic dimensions and smoothness of the underlying strata. Utilizing the geometry of the score field, we also establish consistency for estimating the intrinsic dimension of each stratum and propose an algorithm that consistently estimates both the number of strata and their dimensions. Theoretical results for both frameworks provide fundamental insights into the interplay of the underlying geometry, the ambient noise level, and deep generative models. Extensive simulations and real dataset applications, such as molecular dynamics, demonstrate the effectiveness of our methods.
Uncertainty-Aware Sparse Identification of Dynamical Systems via Bayesian Model Averaging
Kashiwamura, Shuhei, Kato, Yusuke, Kori, Hiroshi, Okada, Masato
In many problems of data-driven modeling for dynamical systems, the governing equations are not known a priori and must be selected phenomenologically from a large set of candidate interactions and basis functions. In such situations, point estimates alone can be misleading, because multiple model components may explain the observed data comparably well, especially when the data are limited or the dynamics exhibit poor identifiability. Quantifying the uncertainty associated with model selection is therefore essential for constructing reliable dynamical models from data. In this work, we develop a Bayesian sparse identification framework for dynamical systems with coupled components, aimed at inferring both interaction structure and functional form together with principled uncertainty quantification. The proposed method combines sparse modeling with Bayesian model averaging, yielding posterior inclusion probabilities that quantify the credibility of each candidate interaction and basis component. Through numerical experiments on oscillator networks, we show that the framework accurately recovers sparse interaction structures with quantified uncertainty, including higher-order harmonic components, phase-lag effects, and multi-body interactions. We also demonstrate that, even in a phenomenological setting where the true governing equations are not contained in the assumed model class, the method can identify effective functional components with quantified uncertainty. These results highlight the importance of Bayesian uncertainty quantification in data-driven discovery of dynamical models.
A Predictive View on Streaming Hidden Markov Models
We develop a predictive-first optimisation framework for streaming hidden Markov models. Unlike classical approaches that prioritise full posterior recovery under a fully specified generative model, we assume access to regime-specific predictive models whose parameters are learned online while maintaining a fixed transition prior over regimes. Our objective is to sequentially identify latent regimes while maintaining accurate step-ahead predictive distributions. Because the number of possible regime paths grows exponentially, exact filtering is infeasible. We therefore formulate streaming inference as a constrained projection problem in predictive-distribution space: under a fixed hypothesis budget, we approximate the full posterior predictive by the forward-KL optimal mixture supported on $S$ paths. The solution is the renormalised top-$S$ posterior-weighted mixture, providing a principled derivation of beam search for HMMs. The resulting algorithm is fully recursive and deterministic, performing beam-style truncation with closed-form predictive updates and requiring neither EM nor sampling. Empirical comparisons against Online EM and Sequential Monte Carlo under matched computational budgets demonstrate competitive prequential performance.
Data-Efficient Non-Gaussian Semi-Nonparametric Density Estimation for Nonlinear Dynamical Systems
Liao, Aaron R., Oguri, Kenshiro, Carpenter, Michele D.
Accurate representation of non-Gaussian distributions of quantities of interest in nonlinear dynamical systems is critical for estimation, control, and decision-making, but can be challenging when forward propagations are expensive to carry out. This paper presents an approach for estimating probability density functions of states evolving under nonlinear dynamics using Seminonparametric (SNP), or Gallant-Nychka, densities. SNP densities employ a probabilists' Hermite polynomial basis to model non-Gaussian behavior and are positive everywhere on the support by construction. We use Monte Carlo to approximate the expectation integrals that arise in the maximum likelihood estimation of SNP coefficients, and introduce a convex relaxation to generate effective initial estimates. The method is demonstrated on density and quantile estimation for the chaotic Lorenz system. The results demonstrate that the proposed method can accurately capture non-Gaussian density structure and compute quantiles using significantly fewer samples than raw Monte Carlo sampling.
tBayes-MICE: A Bayesian Approach to Multiple Imputation for Time Series Data
Ibenegbu, Amuche, de Micheaux, Pierre Lafaye, Chandra, Rohitash
Time-series analysis is often affected by missing data, a common problem across several fields, including healthcare and environmental monitoring. Multiple Imputation by Chained Equations (MICE) has been prominent for imputing missing values through "fully conditional specification". We extend MICE using the Bayesian framework (tBayes-MICE), utilising Bayesian inference to impute missing values via Markov Chain Monte Carlo (MCMC) sampling to account for uncertainty in MICE model parameters and imputed values. We also include temporally informed initialisation and time-lagged features in the model to respect the sequential nature of time-series data. We evaluate the tBayes-MICE method using two real-world datasets (AirQuality and PhysioNet), and using both the Random Walk Metropolis (RWM) and the Metropolis-Adjusted Langevin Algorithm (MALA) samplers. Our results demonstrate that tBayes-MICE reduces imputation errors relative to the baseline methods over all variables and accounts for uncertainty in the imputation process, thereby providing a more accurate measure of imputation error. We also found that MALA mixed better than RWM across most variables, achieving comparable accuracy while providing more consistent posterior exploration. Overall, these findings suggest that the tBayes-MICE framework represents a practical and efficient approach to time-series imputation, balancing increased accuracy with meaningful quantification of uncertainty in various environmental and clinical settings.
Computationally lightweight classifiers with frequentist bounds on predictions
Murali, Shreeram, Rojas, Cristian R., Baumann, Dominik
While both classical and neural network classifiers can achieve high accuracy, they fall short on offering uncertainty bounds on their predictions, making them unfit for safety-critical applications. Existing kernel-based classifiers that provide such bounds scale with $\mathcal O (n^{\sim3})$ in time, making them computationally intractable for large datasets. To address this, we propose a novel, computationally efficient classification algorithm based on the Nadaraya-Watson estimator, for whose estimates we derive frequentist uncertainty intervals. We evaluate our classifier on synthetically generated data and on electrocardiographic heartbeat signals from the MIT-BIH Arrhythmia database. We show that the method achieves competitive accuracy $>$\SI{96}{\percent} at $\mathcal O(n)$ and $\mathcal O(\log n)$ operations, while providing actionable uncertainty bounds. These bounds can, e.g., aid in flagging low-confidence predictions, making them suitable for real-time settings with resource constraints, such as diagnostic monitoring or implantable devices.
A Bayesian Information-Theoretic Approach to Data Attribution
Tailor, Dharmesh, Felicioni, Nicolò, Ciosek, Kamil
Training Data Attribution (TDA) seeks to trace model predictions back to influential training examples, enhancing interpretability and safety. We formulate TDA as a Bayesian information-theoretic problem: subsets are scored by the information loss they induce - the entropy increase at a query when removed. This criterion credits examples for resolving predictive uncertainty rather than label noise. To scale to modern networks, we approximate information loss using a Gaussian Process surrogate built from tangent features. We show this aligns with classical influence scores for single-example attribution while promoting diversity for subsets. For even larger-scale retrieval, we relax to an information-gain objective and add a variance correction for scalable attribution in vector databases. Experiments show competitive performance on counterfactual sensitivity, ground-truth retrieval and coreset selection, showing that our method scales to modern architectures while bridging principled measures with practice.
Massively Parallel Exact Inference for Hawkes Processes
Multivariate Hawkes processes are a widely used class of self-exciting point processes, but maximum likelihood estimation naively scales as $O(N^2)$ in the number of events. The canonical linear exponential Hawkes process admits a faster $O(N)$ recurrence, but prior work evaluates this recurrence sequentially, without exploiting parallelization on modern GPUs. We show that the Hawkes process intensity can be expressed as a product of sparse transition matrices admitting a linear-time associative multiply, enabling computation via a parallel prefix scan. This yields a simple yet massively parallelizable algorithm for maximum likelihood estimation of linear exponential Hawkes processes. Our method reduces the computational complexity to approximately $O(N/P)$ with $P$ parallel processors, and naturally yields a batching scheme to maintain constant memory usage, avoiding GPU memory constraints. Importantly, it computes the exact likelihood without any additional assumptions or approximations, preserving the simplicity and interpretability of the model. We demonstrate orders-of-magnitude speedups on simulated and real datasets, scaling to thousands of nodes and tens of millions of events, substantially beyond scales reported in prior work. We provide an open-source PyTorch library implementing our optimizations.
A Direct Approach for Handling Contextual Bandits with Latent State Dynamics
We revisit the finite-armed linear bandit model by Nelson et al. (2022), where contexts and rewards are governed by a finite hidden Markov chain. Nelson et al. (2022) approach this model by a reduction to linear contextual bandits; but to do so, they actually introduce a simplification in which rewards are linear functions of the posterior probabilities over the hidden states given the observed contexts, rather than functions of the hidden states themselves. Their analysis (but not their algorithm) also does not take into account the estimation of the HMM parameters, and only tackles expected, not high-probability, bounds, which suffer in addition from unnecessary complex dependencies on the model (like reward gaps). We instead study the more natural model incorporating direct dependencies in the hidden states (on top of dependencies on the observed contexts, as is natural for contextual bandits) and also obtain stronger, high-probability, regret bounds for a fully adaptive strategy that estimates HMM parameters online. These bounds do not depend on the reward functions and only depend on the model through the estimation of the HMM parameters.