Bayesian Inference
VIKING: Deep variational inference with stochastic projections
Fadel, Samuel G., Roy, Hrittik, Krämer, Nicholas, Zainchkovskyy, Yevgen, Syrota, Stas, Mahou, Alejandro Valverde, Ek, Carl Henrik, Hauberg, Søren
Variational mean field approximations tend to struggle with contemporary overparametrized deep neural networks. Where a Bayesian treatment is usually associated with high-quality predictions and uncertainties, the practical reality has been the opposite, with unstable training, poor predictive power, and subpar calibration. Building upon recent work on reparametrizations of neural networks, we propose a simple variational family that considers two independent linear subspaces of the parameter space. These represent functional changes inside and outside the support of training data. This allows us to build a fully-correlated approximate posterior reflecting the overparametrization that tunes easy-to-interpret hyperparameters. We develop scalable numerical routines that maximize the associated evidence lower bound (ELBO) and sample from the approximate posterior. Empirically, we observe state-of-the-art performance across tasks, models, and datasets compared to a wide array of baseline methods. Our results show that approximate Bayesian inference applied to deep neural networks is far from a lost cause when constructing inference mechanisms that reflect the geometry of reparametrizations.
Sample-efficient and Scalable Exploration in Continuous-Time RL
Iten, Klemens, Treven, Lenart, Sukhija, Bhavya, Dörfler, Florian, Krause, Andreas
Reinforcement learning algorithms are typically designed for discrete-time dynamics, even though the underlying real-world control systems are often continuous in time. In this paper, we study the problem of continuous-time reinforcement learning, where the unknown system dynamics are represented using nonlinear ordinary differential equations (ODEs). We leverage probabilistic models, such as Gaussian processes and Bayesian neural networks, to learn an uncertainty-aware model of the underlying ODE. Our algorithm, COMBRL, greedily maximizes a weighted sum of the extrinsic reward and model epistemic uncertainty. This yields a scalable and sample-efficient approach to continuous-time model-based RL. We show that COMBRL achieves sublinear regret in the reward-driven setting, and in the unsupervised RL setting (i.e., without extrinsic rewards), we provide a sample complexity bound. In our experiments, we evaluate COMBRL in both standard and unsupervised RL settings and demonstrate that it scales better, is more sample-efficient than prior methods, and outperforms baselines across several deep RL tasks.
Transformers can do Bayesian Clustering
Bhaskaran, Prajit, Viering, Tom
Bayesian clustering accounts for uncertainty but is computationally demanding at scale. Furthermore, real-world datasets often contain missing values, and simple imputation ignores the associated uncertainty, resulting in suboptimal results. We present Cluster-PFN, a Transformer-based model that extends Prior-Data Fitted Networks (PFNs) to unsupervised Bayesian clustering. Trained entirely on synthetic datasets generated from a finite Gaussian Mixture Model (GMM) prior, Cluster-PFN learns to estimate the posterior distribution over both the number of clusters and the cluster assignments. Our method estimates the number of clusters more accurately than handcrafted model selection procedures such as AIC, BIC and Variational Inference (VI), and achieves clustering quality competitive with VI while being orders of magnitude faster. Cluster-PFN can be trained on complex priors that include missing data, outperforming imputation-based baselines on real-world genomic datasets, at high missingness. These results show that the Cluster-PFN can provide scalable and flexible Bayesian clustering.
Central Bank Digital Currency, Flight-to-Quality, and Bank-Runs in an Agent-Based Model
Barucci, Emilio, Gurgone, Andrea, Iori, Giulia, Azzone, Michele
We analyse financial stability and welfare impacts associated with the introduction of a Central Bank Digital Currency (CBDC) in a macroeconomic agent-based model. The model considers firms, banks, and households interacting on labour, goods, credit, and interbank markets. Households move their liquidity from deposits to CBDC based on the perceived riskiness of their banks. We find that the introduction of CBDC exacerbates bank-runs and may lead to financial instability phenomena. The effect can be changed by introducing a limit on CBDC holdings. The adoption of CBDC has little effect on macroeconomic variables but the interest rate on loans to firms goes up and credit goes down in a limited way. CBDC leads to a redistribution of wealth from firms and banks to households with a higher bank default rate. CBDC may have negative welfare effects, but a bound on holding enables a welfare improvement.
Evaluating the Use of Large Language Models as Synthetic Social Agents in Social Science Research
Large Language Models (LLMs) are being increasingly used as synthetic agents in social science, in applications ranging from augmenting survey responses to powering multi-agent simulations. This paper outlines cautions that should be taken when interpreting LLM outputs and proposes a pragmatic reframing for the social sciences in which LLMs are used as high-capacity pattern matchers for quasi-predictive interpolation under explicit scope conditions and not as substitutes for probabilistic inference. Practical guardrails such as independent draws, preregistered human baselines, reliability-aware validation, and subgroup calibration, are introduced so that researchers may engage in useful prototyping and forecasting while avoiding category errors.
Testing-driven Variable Selection in Bayesian Modal Regression
Duan, Jiasong, Zhang, Hongmei, Huang, Xianzheng
We propose a Bayesian variable selection method in the framework of modal regression for heavy-tailed responses. An efficient expectation-maximization algorithm is employed to expedite parameter estimation. A test statistic is constructed to exploit the shape of the model error distribution to effectively separate informative covariates from unimportant ones. Through simulations, we demonstrate and evaluate the efficacy of the proposed method in identifying important covariates in the presence of non-Gaussian model errors. Finally, we apply the proposed method to analyze two datasets arising in genetic and epigenetic studies.
Causal Effect Estimation with TMLE: Handling Missing Data and Near-Violations of Positivity
Wiederkehr, Christoph, Heumann, Christian, Schomaker, Michael
We evaluate the performance of targeted maximum likelihood estimation (TMLE) for estimating the average treatment effect in missing data scenarios under varying levels of positivity violations. We employ model- and design-based simulations, with the latter using undersmoothed highly adaptive lasso on the 'WASH Benefits Bangladesh' dataset to mimic real-world complexities. Five missingness-directed acyclic graphs are considered, capturing common missing data mechanisms in epidemiological research, particularly in one-point exposure studies. These mechanisms include also not-at-random missingness in the exposure, outcome, and confounders. We compare eight missing data methods in conjunction with TMLE as the analysis method, distinguishing between non-multiple imputation (non-MI) and multiple imputation (MI) approaches. The MI approaches use both parametric and machine-learning models. Results show that non-MI methods, particularly complete cases with TMLE incorporating an outcome-missingness model, exhibit lower bias compared to all other evaluated missing data methods and greater robustness against positivity violations across. In Comparison MI with classification and regression trees (CART) achieve lower root mean squared error, while often maintaining nominal coverage rates. Our findings highlight the trade-offs between bias and coverage, and we recommend using complete cases with TMLE incorporating an outcome-missingness model for bias reduction and MI CART when accurate confidence intervals are the priority.
Semi-Supervised Learning under General Causal Models
Moore, Archer, Shim, Heejung, Zhu, Jingge, Gong, Mingming
Semi-supervised learning (SSL) aims to train a machine learning model using both labelled and unlabelled data. While the unlabelled data have been used in various ways to improve the prediction accuracy, the reason why unlabelled data could help is not fully understood. One interesting and promising direction is to understand SSL from a causal perspective. In light of the independent causal mechanisms principle, the unlabelled data can be helpful when the label causes the features but not vice versa. However, the causal relations between the features and labels can be complex in real world applications. In this paper, we propose a SSL framework that works with general causal models in which the variables have flexible causal relations. More specifically, we explore the causal graph structures and design corresponding causal generative models which can be learned with the help of unlabelled data. The learned causal generative model can generate synthetic labelled data for training a more accurate predictive model. We verify the effectiveness of our proposed method by empirical studies on both simulated and real data.
Bayesian Nonlinear PDE Inference via Gaussian Process Collocation with Application to the Richards Equation
Yang, Yumo, Bouazza, Anass Ben, Dong, Xuejun, Zhou, Quan
The estimation of unknown parameters in nonlinear partial differential equations (PDEs) offers valuable insights across a wide range of scientific domains. In this work, we focus on estimating plant root parameters in the Richards equation, which is essential for understanding the soil-plant system in agricultural studies. Since conventional methods are computationally intensive and often yield unstable estimates, we develop a new Gaussian process collocation method for efficient Bayesian inference. Unlike existing Gaussian process-based approaches, our method constructs an approximate posterior distribution using samples drawn from a Gaussian process model fitted to the observed data, which does not require any structural assumption about the underlying PDE. Further, we propose to use an importance sampling procedure to correct for the discrepancy between the approximate and true posterior distributions. As an alternative, we also devise a prior-guided Bayesian optimization algorithm leveraging the approximate posterior. Simulation studies demonstrate that our method yields robust estimates under various settings. Finally, we apply our method on a real agricultural data set and estimate the plant root parameters with uncertainty quantification.
MetaCaDI: A Meta-Learning Framework for Scalable Causal Discovery with Unknown Interventions
Ong, Hans Jarett, Chikahara, Yoichi, Iwata, Tomoharu
Uncovering the underlying causal mechanisms of complex real-world systems remains a significant challenge, as these systems often entail high data collection costs and involve unknown interventions. We introduce MetaCaDI, the first framework to cast the joint discovery of a causal graph and unknown interventions as a meta-learning problem. MetaCaDI is a Bayesian framework that learns a shared causal graph structure across multiple experiments and is optimized to rapidly adapt to new, few-shot intervention target prediction tasks. A key innovation is our model's analytical adaptation, which uses a closed-form solution to bypass expensive and potentially unstable gradient-based bilevel optimization. Extensive experiments on synthetic and complex gene expression data demonstrate that MetaCaDI significantly outperforms state-of-the-art methods. It excels at both causal graph recovery and identifying intervention targets from as few as 10 data instances, proving its robustness in data-scarce scenarios.