Bayesian Inference
Modelling non-stationary extremal dependence through a geometric approach
Murphy-Barltrop, C. J. R., Wadsworth, J. L., de Carvalho, M., Youngman, B. D.
Non-stationary extremal dependence, whereby the relationship between the extremes of multiple variables evolves over time, is commonly observed in many environmental and financial data sets. However, most multivariate extreme value models are only suited to stationary data. A recent approach to multivariate extreme value modelling uses a geometric framework, whereby extremal dependence features are inferred through the limiting shapes of scaled sample clouds. This framework can capture a wide range of dependence structures, and a variety of inference procedures have been proposed in the stationary setting. In this work, we first extend the geometric framework to the non-stationary setting and outline assumptions to ensure the necessary convergence conditions hold. We then introduce a flexible, semi-parametric modelling framework for obtaining estimates of limit sets in the non-stationary setting. Through rigorous simulation studies, we demonstrate that our proposed framework can capture a wide range of dependence forms and is robust to different model formulations. We illustrate the proposed methods on financial returns data and present several practical uses.
Direct Bias-Correction Term Estimation for Propensity Scores and Average Treatment Effect Estimation
This study considers the estimation of the average treatment effect (ATE). For ATE estimation, we estimate the propensity score through direct bias-correction term estimation. Let $\{(X_i, D_i, Y_i)\}_{i=1}^{n}$ be the observations, where $X_i \in \mathbb{R}^p$ denotes $p$-dimensional covariates, $D_i \in \{0, 1\}$ denotes a binary treatment assignment indicator, and $Y_i \in \mathbb{R}$ is an outcome. In ATE estimation, the bias-correction term $h_0(X_i, D_i) = \frac{1[D_i = 1]}{e_0(X_i)} - \frac{1[D_i = 0]}{1 - e_0(X_i)}$ plays an important role, where $e_0(X_i)$ is the propensity score, the probability of being assigned treatment $1$. In this study, we propose estimating $h_0$ (or equivalently the propensity score $e_0$) by directly minimizing the prediction error of $h_0$. Since the bias-correction term $h_0$ is essential for ATE estimation, this direct approach is expected to improve estimation accuracy for the ATE. For example, existing studies often employ maximum likelihood or covariate balancing to estimate $e_0$, but these approaches may not be optimal for accurately estimating $h_0$ or the ATE. We present a general framework for this direct bias-correction term estimation approach from the perspective of Bregman divergence minimization and conduct simulation studies to evaluate the effectiveness of the proposed method.
A Nonparametric Discrete Hawkes Model with a Collapsed Gaussian-Process Prior
Brisley, Trinnhallen, Ross, Gordon, Paulin, Daniel
Hawkes process models are used in settings where past events increase the likelihood of future events occurring. Many applications record events as counts on a regular grid, yet discrete-time Hawkes models remain comparatively underused and are often constrained by fixed-form baselines and excitation kernels. In particular, there is a lack of flexible, nonparametric treatments of both the baseline and the excitation in discrete time. To this end, we propose the Gaussian Process Discrete Hawkes Process (GP-DHP), a nonparametric framework that places Gaussian process priors on both the baseline and the excitation and performs inference through a collapsed latent representation. This yields smooth, data-adaptive structure without prespecifying trends, periodicities, or decay shapes, and enables maximum a posteriori (MAP) estimation with near-linear-time \(O(T\log T)\) complexity. A closed-form projection recovers interpretable baseline and excitation functions from the optimized latent trajectory. In simulations, GP-DHP recovers diverse excitation shapes and evolving baselines. In case studies on U.S. terrorism incidents and weekly Cryptosporidiosis counts, it improves test predictive log-likelihood over standard parametric discrete Hawkes baselines while capturing bursts, delays, and seasonal background variation. The results indicate that flexible discrete-time self-excitation can be achieved without sacrificing scalability or interpretability.
GenUQ: Predictive Uncertainty Estimates via Generative Hyper-Networks
Yen, Tian Yu, Jones, Reese E., Patel, Ravi G.
Operator learning is a recently developed generalization of regression to mappings between functions. It promises to drastically reduce expensive numerical integration of PDEs to fast evaluations of mappings between functional states of a system, i.e., surrogate and reduced-order modeling. Operator learning has already found applications in several areas such as modeling sea ice, combustion, and atmospheric physics. Recent approaches towards integrating uncertainty quantification into the operator models have relied on likelihood based methods to infer parameter distributions from noisy data. However, stochastic operators may yield actions from which a likelihood is difficult or impossible to construct. In this paper, we introduce, GenUQ, a measure-theoretic approach to UQ that avoids constructing a likelihood by introducing a generative hyper-network model that produces parameter distributions consistent with observed data. We demonstrate that GenUQ outperforms other UQ methods in three example problems, recovering a manufactured operator, learning the solution operator to a stochastic elliptic PDE, and modeling the failure location of porous steel under tension.
General Pruning Criteria for Fast SBL
Möderl, Jakob, Leitinger, Erik, Fleury, Bernard Henri
Sparse Bayesian learning (SBL) associates to each weight in the underlying linear model a hyperparameter by assuming that each weight is Gaussian distributed with zero mean and precision (inverse variance) equal to its associated hyperparameter. The method estimates the hyperparameters by marginalizing out the weights and performing (marginalized) maximum likelihood (ML) estimation. SBL returns many hyperparameter estimates to diverge to infinity, effectively setting the estimates of the corresponding weights to zero (i.e., pruning the corresponding weights from the model) and thereby yielding a sparse estimate of the weight vector. In this letter, we analyze the marginal likelihood as function of a single hyperparameter while keeping the others fixed, when the Gaussian assumptions on the noise samples and the weight distribution that underlies the derivation of SBL are weakened. We derive sufficient conditions that lead, on the one hand, to finite hyperparameter estimates and, on the other, to infinite ones. Finally, we show that in the Gaussian case, the two conditions are complementary and coincide with the pruning condition of fast SBL (F-SBL), thereby providing additional insights into this algorithm.
Machine Learning. The Science of Selection under Uncertainty
Learning, whether natural or artificial, is a process of selection. It starts with a set of candidate options and selects the more successful ones. In the case of machine learning the selection is done based on empirical estimates of prediction accuracy of candidate prediction rules on some data. Due to randomness of data sampling the empirical estimates are inherently noisy, leading to selection under uncertainty. The book provides statistical tools to obtain theoretical guarantees on the outcome of selection under uncertainty. We start with concentration of measure inequalities, which are the main statistical instrument for controlling how much an empirical estimate of expectation of a function deviates from the true expectation. The book covers a broad range of inequalities, including Markov's, Chebyshev's, Hoeffding's, Bernstein's, Empirical Bernstein's, Unexpected Bernstein's, kl, and split-kl. We then study the classical (offline) supervised learning and provide a range of tools for deriving generalization bounds, including Occam's razor, Vapnik-Chervonenkis analysis, and PAC-Bayesian analysis. The latter is further applied to derive generalization guarantees for weighted majority votes. After covering the offline setting, we turn our attention to online learning. We present the space of online learning problems characterized by environmental feedback, environmental resistance, and structural complexity. A common performance measure in online learning is regret, which compares performance of an algorithm to performance of the best prediction rule in hindsight, out of a restricted set of prediction rules. We present tools for deriving regret bounds in stochastic and adversarial environments, and under full information and bandit feedback.
Chiseling: Powerful and Valid Subgroup Selection via Interactive Machine Learning
Cheng, Nathan, Spector, Asher, Janson, Lucas
In regression and causal inference, controlled subgroup selection aims to identify, with inferential guarantees, a subgroup (defined as a subset of the covariate space) on which the average response or treatment effect is above a given threshold. E.g., in a clinical trial, it may be of interest to find a subgroup with a positive average treatment effect. However, existing methods either lack inferential guarantees, heavily restrict the search for the subgroup, or sacrifice efficiency by naive data splitting. We propose a novel framework called chiseling that allows the analyst to interactively refine and test a candidate subgroup by iteratively shrinking it. The sole restriction is that the shrinkage direction only depends on the points outside the current subgroup, but otherwise the analyst may leverage any prior information or machine learning algorithm. Despite this flexibility, chiseling controls the probability that the discovered subgroup is null (e.g., has a non-positive average treatment effect) under minimal assumptions: for example, in randomized experiments, this inferential validity guarantee holds under only bounded moment conditions. When applied to a variety of simulated datasets and a real survey experiment, chiseling identifies substantially better subgroups than existing methods with inferential guarantees.
Integrating Knowledge Graphs and Bayesian Networks: A Hybrid Approach for Explainable Disease Risk Prediction
Nzomo, Mbithe, Moodley, Deshendran
Multimodal electronic health record (EHR) data is useful for disease risk prediction based on medical domain knowledge. However, general medical knowledge must be adapted to specific healthcare settings and patient populations to achieve practical clinical use. Additionally, risk prediction systems must handle uncertainty from incomplete data and non-deterministic health outcomes while remaining explainable. These challenges can be alleviated by the integration of knowledge graphs (KGs) and Bayesian networks (BNs). We present a novel approach for constructing BNs from ontology-based KGs and multimodal EHR data for explainable disease risk prediction. Through an application use case of atrial fibrillation and real-world EHR data, we demonstrate that the approach balances generalised medical knowledge with patient-specific context, effectively handles uncertainty, is highly explainable, and achieves good predictive performance.