Bayesian Inference
Rethinking recidivism through a causal lens
Shirvaikar, Vik, Lakshminarayan, Choudur
Predictive modeling of criminal recidivism, or whether people will re-offend in the future, has a long and contentious history. Modern causal inference methods allow us to move beyond prediction and target the "treatment effect" of a specific intervention on an outcome in an observational dataset. In this paper, we look specifically at the effect of incarceration (prison time) on recidivism, using a well-known dataset from North Carolina. Two popular causal methods for addressing confounding bias are explained and demonstrated: directed acyclic graph (DAG) adjustment and double machine learning (DML), including a sensitivity analysis for unobserved confounders. We find that incarceration has a detrimental effect on recidivism, i.e., longer prison sentences make it more likely that individuals will re-offend after release, although this conclusion should not be generalized beyond the scope of our data. We hope that this case study can inform future applications of causal inference to criminal justice analysis.
Guiding adaptive shrinkage by co-data to improve regression-based prediction and feature selection
van de Wiel, Mark A., van Wieringen, Wessel N.
The high dimensional nature of genomics data complicates feature selection, in particular in low sample size studies - not uncommon in clinical prediction settings. It is widely recognized that complementary data on the features, `co-data', may improve results. Examples are prior feature groups or p-values from a related study. Such co-data are ubiquitous in genomics settings due to the availability of public repositories. Yet, the uptake of learning methods that structurally use such co-data is limited. We review guided adaptive shrinkage methods: a class of regression-based learners that use co-data to adapt the shrinkage parameters, crucial for the performance of those learners. We discuss technical aspects, but also the applicability in terms of types of co-data that can be handled. This class of methods is contrasted with several others. In particular, group-adaptive shrinkage is compared with the better-known sparse group-lasso by evaluating feature selection. Finally, we demonstrate the versatility of the guided shrinkage methodology by showing how to `do-it-yourself': we integrate implementations of a co-data learner and the spike-and-slab prior for the purpose of improving feature selection in genetics studies.
Harmonizing Program Induction with Rate-Distortion Theory
Zhou, Hanqi, Nagy, David G., Wu, Charley M.
Many aspects of human learning have been proposed as a process of constructing mental programs: from acquiring symbolic number representations to intuitive theories about the world. In parallel, there is a long-tradition of using information processing to model human cognition through Rate Distortion Theory (RDT). Yet, it is still poorly understood how to apply RDT when mental representations take the form of programs. In this work, we adapt RDT by proposing a three way trade-off among rate (description length), distortion (error), and computational costs (search budget). We use simulations on a melody task to study the implications of this trade-off, and show that constructing a shared program library across tasks provides global benefits. However, this comes at the cost of sensitivity to curricula, which is also characteristic of human learners. Finally, we use methods from partial information decomposition to generate training curricula that induce more effective libraries and better generalization.
Inference With Combining Rules From Multiple Differentially Private Synthetic Datasets
Nombo, Leila, Charest, Anne-Sophie
Differential privacy (DP) has been accepted as a rigorous criterion for measuring the privacy protection offered by random mechanisms used to obtain statistics or, as we will study here, synthetic datasets from confidential data. Methods to generate such datasets are increasingly numerous, using varied tools including Bayesian models, deep neural networks and copulas. However, little is still known about how to properly perform statistical inference with these differentially private synthetic (DIPS) datasets. The challenge is for the analyses to take into account the variability from the synthetic data generation in addition to the usual sampling variability. A similar challenge also occurs when missing data is imputed before analysis, and statisticians have developed appropriate inference procedures for this case, which we tend extended to the case of synthetic datasets for privacy. In this work, we study the applicability of these procedures, based on combining rules, to the analysis of DIPS datasets. Our empirical experiments show that the proposed combining rules may offer accurate inference in certain contexts, but not in all cases.
Accelerating Convergence in Bayesian Few-Shot Classification
Ke, Tianjun, Cao, Haoqun, Zhou, Feng
Bayesian few-shot classification has been a focal point in the field of few-shot learning. This paper seamlessly integrates mirror descent-based variational inference into Gaussian process-based few-shot classification, addressing the challenge of non-conjugate inference. By leveraging non-Euclidean geometry, mirror descent achieves accelerated convergence by providing the steepest descent direction along the corresponding manifold. It also exhibits the parameterization invariance property concerning the variational distribution. Experimental results demonstrate competitive classification accuracy, improved uncertainty quantification, and faster convergence compared to baseline models. Additionally, we investigate the impact of hyperparameters and components. Code is publicly available at https://github.com/keanson/MD-BSFC.
Network reconstruction via the minimum description length principle
A fundamental problem associated with the task of network reconstruction from dynamical or behavioral data consists in determining the most appropriate model complexity in a manner that prevents overfitting, and produces an inferred network with a statistically justifiable number of edges. The status quo in this context is based on $L_{1}$ regularization combined with cross-validation. However, besides its high computational cost, this commonplace approach unnecessarily ties the promotion of sparsity with weight "shrinkage". This combination forces a trade-off between the bias introduced by shrinkage and the network sparsity, which often results in substantial overfitting even after cross-validation. In this work, we propose an alternative nonparametric regularization scheme based on hierarchical Bayesian inference and weight quantization, which does not rely on weight shrinkage to promote sparsity. Our approach follows the minimum description length (MDL) principle, and uncovers the weight distribution that allows for the most compression of the data, thus avoiding overfitting without requiring cross-validation. The latter property renders our approach substantially faster to employ, as it requires a single fit to the complete data. As a result, we have a principled and efficient inference scheme that can be used with a large variety of generative models, without requiring the number of edges to be known in advance. We also demonstrate that our scheme yields systematically increased accuracy in the reconstruction of both artificial and empirical networks. We highlight the use of our method with the reconstruction of interaction networks between microbial communities from large-scale abundance samples involving in the order of $10^{4}$ to $10^{5}$ species, and demonstrate how the inferred model can be used to predict the outcome of interventions in the system.
Scalable Vertical Federated Learning via Data Augmentation and Amortized Inference
Hassan, Conor, Sutton, Matthew, Mira, Antonietta, Mengersen, Kerrie
Vertical federated learning (VFL) has emerged as a paradigm for collaborative model estimation across multiple clients, each holding a distinct set of covariates. This paper introduces the first comprehensive framework for fitting Bayesian models in the VFL setting. We propose a novel approach that leverages data augmentation techniques to transform VFL problems into a form compatible with existing Bayesian federated learning algorithms. We present an innovative model formulation for specific VFL scenarios where the joint likelihood factorizes into a product of client-specific likelihoods. To mitigate the dimensionality challenge posed by data augmentation, which scales with the number of observations and clients, we develop a factorized amortized variational approximation that achieves scalability independent of the number of observations. We showcase the efficacy of our framework through extensive numerical experiments on logistic regression, multilevel regression, and a novel hierarchical Bayesian split neural net model. Our work paves the way for privacy-preserving, decentralized Bayesian inference in vertically partitioned data scenarios, opening up new avenues for research and applications in various domains.
Joint Parameter and Parameterization Inference with Uncertainty Quantification through Differentiable Programming
Qu, Yongquan, Bhouri, Mohamed Aziz, Gentine, Pierre
Accurate representations of unknown and sub-grid physical processes through parameterizations (or closure) in numerical simulations with quantified uncertainty are critical for resolving the coarse-grained partial differential equations that govern many problems ranging from weather and climate prediction to turbulence simulations. Recent advances have seen machine learning (ML) increasingly applied to model these subgrid processes, resulting in the development of hybrid physics-ML models through the integration with numerical solvers. In this work, we introduce a novel framework for the joint estimation of physical parameters and machine learning parameterizations with uncertainty quantification. Our framework incorporates online training and efficient Bayesian inference within a high-dimensional parameter space, facilitated by differentiable programming. This proof of concept underscores the substantial potential of differentiable programming in synergistically combining machine learning with differential equations, thereby enhancing the capabilities of hybrid physics-ML modeling.
The Role of Predictive Uncertainty and Diversity in Embodied AI and Robot Learning
Uncertainty has long been a critical area of study in robotics, particularly when robots are equipped with analytical models. As we move towards the widespread use of deep neural networks in robots, which have demonstrated remarkable performance in research settings, understanding the nuances of uncertainty becomes crucial for their real-world deployment. This guide offers an overview of the importance of uncertainty and provides methods to quantify and evaluate it from an applications perspective.
Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models
Onal, Emre, Flöge, Klemens, Caldwell, Emma, Sheverdin, Arsen, Fortuin, Vincent
Fine-tuned Large Language Models (LLMs) often suffer from overconfidence and poor calibration, particularly when fine-tuned on small datasets. To address these challenges, we propose a simple combination of Low-Rank Adaptation (LoRA) with Gaussian Stochastic Weight Averaging (SWAG), facilitating approximate Bayesian inference in LLMs. Through extensive testing across several Natural Language Processing (NLP) benchmarks, we demonstrate that our straightforward and computationally efficient approach improves model generalization and calibration competitively with comparable, more sophisticated methods for Bayesian inference in LLMs. We further show that our method exhibits greater robustness against distribution shift, as reflected in its performance on out-of-distribution tasks.