Goto

Collaborating Authors

 Bayesian Inference


Bayesian Double Descent

arXiv.org Machine Learning

Double descent is a phenomenon of over-parameterized statistical models. Our goal is to view double descent from a Bayesian perspective. Over-parameterized models such as deep neural networks have an interesting re-descending property in their risk characteristics. This is a recent phenomenon in machine learning and has been the subject of many studies. As the complexity of the model increases, there is a U-shaped region corresponding to the traditional bias-variance trade-off, but then as the number of parameters equals the number of observations and the model becomes one of interpolation, the risk can become infinite and then, in the over-parameterized region, it re-descends -- the double descent effect. We show that this has a natural Bayesian interpretation. Moreover, we show that it is not in conflict with the traditional Occam's razor that Bayesian models possess, in that they tend to prefer simpler models when possible. We illustrate the approach with an example of Bayesian model selection in neural networks. Finally, we conclude with directions for future research.


Goal-Oriented Sequential Bayesian Experimental Design for Causal Learning

arXiv.org Machine Learning

We present GO-CBED, a goal-oriented Bayesian framework for sequential causal experimental design. Unlike conventional approaches that select interventions aimed at inferring the full causal model, GO-CBED directly maximizes the expected information gain (EIG) on user-specified causal quantities of interest, enabling more targeted and efficient experimentation. The framework is both non-myopic, optimizing over entire intervention sequences, and goal-oriented, targeting only model aspects relevant to the causal query. To address the intractability of exact EIG computation, we introduce a variational lower bound estimator, optimized jointly through a transformer-based policy network and normalizing flow-based variational posteriors. The resulting policy enables real-time decision-making via an amortized network. We demonstrate that GO-CBED consistently outperforms existing baselines across various causal reasoning and discovery tasks-including synthetic structural causal models and semi-synthetic gene regulatory networks-particularly in settings with limited experimental budgets and complex causal mechanisms. Our results highlight the benefits of aligning experimental design objectives with specific research goals and of forward-looking sequential planning.


Bayesian Invariance Modeling of Multi-Environment Data

arXiv.org Machine Learning

Invariant prediction [Peters et al., 2016] analyzes feature/outcome data from multiple environments to identify invariant features - those with a stable predictive relationship to the outcome. Such features support generalization to new environments and help reveal causal mechanisms. Previous methods have primarily tackled this problem through hypothesis testing or regularized optimization. Here we develop Bayesian Invariant Prediction (BIP), a probabilistic model for invariant prediction. BIP encodes the indices of invariant features as a latent variable and recover them by posterior inference. Under the assumptions of Peters et al. [2016], the BIP posterior targets the true invariant features. We prove that the posterior is consistent and that greater environment heterogeneity leads to faster posterior contraction. To handle many features, we design an efficient variational approximation called VI-BIP. In simulations and real data, we find that BIP and VI-BIP are more accurate and scalable than existing methods for invariant prediction.


stCEG: An R Package for Modelling Events over Spatial Areas Using Chain Event Graphs

arXiv.org Machine Learning

stCEG is an R package which allows a user to fully specify a Chain Event Graph (CEG) model from data and to produce interactive plots. It includes functions for the user to visualise spatial variables they wish to include in the model. There is also a web-based graphical user interface (GUI) provided, increasing ease of use for those without knowledge of R. We demonstrate stCEG using a dataset of homicides in London, which is included in the package. stCEG is the first software package for CEGs that allows for full model customisation.


Fast Gaussian Processes under Monotonicity Constraints

arXiv.org Machine Learning

Gaussian processes (GPs) are widely used as surrogate models for complicated functions in scientific and engineering applications. In many cases, prior knowledge about the function to be approximated, such as monotonicity, is available and can be leveraged to improve model fidelity. Incorporating such constraints into GP models enhances predictive accuracy and reduces uncertainty, but remains a computationally challenging task for high-dimensional problems. In this work, we present a novel virtual point-based framework for building constrained GP models under monotonicity constraints, based on regularized linear randomize-then-optimize (RLRTO), which enables efficient sampling from a constrained posterior distribution by means of solving randomized optimization problems. We also enhance two existing virtual point-based approaches by replacing Gibbs sampling with the No U-Turn Sampler (NUTS) for improved efficiency. A Python implementation of these methods is provided and can be easily applied to a wide range of problems. This implementation is then used to validate the approaches on approximating a range of synthetic functions, demonstrating comparable predictive performance between all considered methods and significant improvements in computational efficiency with the two NUTS methods and especially with the RLRTO method. The framework is further applied to construct surrogate models for systems of differential equations.


Distribution-free inference for LightGBM and GLM with Tweedie loss

arXiv.org Machine Learning

Prediction uncertainty quantification is a key research topic in recent years scientific and business problems. In insurance industries (\cite{parodi2023pricing}), assessing the range of possible claim costs for individual drivers improves premium pricing accuracy. It also enables insurers to manage risk more effectively by accounting for uncertainty in accident likelihood and severity. In the presence of covariates, a variety of regression-type models are often used for modeling insurance claims, ranging from relatively simple generalized linear models (GLMs) to regularized GLMs to gradient boosting models (GBMs). Conformal predictive inference has arisen as a popular distribution-free approach for quantifying predictive uncertainty under relatively weak assumptions of exchangeability, and has been well studied under the classic linear regression setting. In this work, we propose new non-conformity measures for GLMs and GBMs with GLM-type loss. Using regularized Tweedie GLM regression and LightGBM with Tweedie loss, we demonstrate conformal prediction performance with these non-conformity measures in insurance claims data. Our simulation results favor the use of locally weighted Pearson residuals for LightGBM over other methods considered, as the resulting intervals maintained the nominal coverage with the smallest average width.


Lost in Retraining: Roaming the Parameter Space of Exponential Families Under Closed-Loop Learning

arXiv.org Machine Learning

Closed-loop learning is the process of repeatedly estimating a model from data generated from the model itself. It is receiving great attention due to the possibility that large neural network models may, in the future, be primarily trained with data generated by artificial neural networks themselves. We study this process for models that belong to exponential families, deriving equations of motions that govern the dynamics of the parameters. We show that maximum likelihood estimation of the parameters endows sufficient statistics with the martingale property and that as a result the process converges to absorbing states that amplify initial biases present in the data. However, we show that this outcome may be prevented if the data contains at least one data point generated from a ground truth model, by relying on maximum a posteriori estimation or by introducing regularisation.


SCoRE: Streamlined Corpus-based Relation Extraction using Multi-Label Contrastive Learning and Bayesian kNN

arXiv.org Artificial Intelligence

The growing demand for efficient knowledge graph (KG) enrichment leveraging external corpora has intensified interest in relation extraction (RE), particularly under low-supervision settings. To address the need for adaptable and noise-resilient RE solutions that integrate seamlessly with pre-trained large language models (PLMs), we introduce SCoRE, a modular and cost-effective sentence-level RE system. SCoRE enables easy PLM switching, requires no finetuning, and adapts smoothly to diverse corpora and KGs. By combining supervised contrastive learning with a Bayesian k-Nearest Neighbors (kNN) classifier for multi-label classification, it delivers robust performance despite the noisy annotations of distantly supervised corpora. To improve RE evaluation, we propose two novel metrics: Correlation Structure Distance (CSD), measuring the alignment between learned relational patterns and KG structures, and Precision at R (P@R), assessing utility as a recommender system. We also release Wiki20d, a benchmark dataset replicating real-world RE conditions where only KG-derived annotations are available. Experiments on five benchmarks show that SCoRE matches or surpasses state-of-the-art methods while significantly reducing energy consumption. Further analyses reveal that increasing model complexity, as seen in prior work, degrades performance, highlighting the advantages of SCoRE's minimal design. Combining efficiency, modularity, and scalability, SCoRE stands as an optimal choice for real-world RE applications.


Kernel Trace Distance: Quantum Statistical Metric between Measures through RKHS Density Operators

arXiv.org Machine Learning

Distances between probability distributions are a key component of many statistical machine learning tasks, from two-sample testing to generative modeling, among others. We introduce a novel distance between measures that compares them through a Schatten norm of their kernel covariance operators. We show that this new distance is an integral probability metric that can be framed between a Maximum Mean Discrepancy (MMD) and a Wasserstein distance. In particular, we show that it avoids some pitfalls of MMD, by being more discriminative and robust to the choice of hyperparameters. Moreover, it benefits from some compelling properties of kernel methods, that can avoid the curse of dimensionality for their sample complexity. We provide an algorithm to compute the distance in practice by introducing an extension of kernel matrix for difference of distributions that could be of independent interest. Those advantages are illustrated by robust approximate Bayesian computation under contamination as well as particle flow simulations.


Estimating prevalence with precision and accuracy

arXiv.org Machine Learning

Unlike classification, whose goal is to estimate the class of each data point in a dataset, prevalence estimation or quantification is a task that aims to estimate the distribution of classes in a dataset. The two main tasks in prevalence estimation are to adjust for bias, due to the prevalence in the training dataset, and to quantify the uncertainty in the estimate. The standard methods used to quantify uncertainty in prevalence estimates are bootstrapping and Bayesian quantification methods. It is not clear which approach is ideal in terms of precision (i.e. the width of confidence intervals) and coverage (i.e. the confidence intervals being well-calibrated). Here, we propose Precise Quantifier (PQ), a Bayesian quantifier that is more precise than existing quantifiers and with well-calibrated coverage. We discuss the theory behind PQ and present experiments based on simulated and real-world datasets. Through these experiments, we establish the factors which influence quantification precision: the discriminatory power of the underlying classifier; the size of the labeled dataset used to train the quantifier; and the size of the unlabeled dataset for which prevalence is estimated. Our analysis provides deep insights into uncertainty quantification for quantification learning.