Uncertainty
Supplementary material for Dynamic Causal Bayesian Optimisation
Symbol Description Vt Set of observable variables at time t V0:TUnion of observable variables at time t= 0,...,T Xt Manipulative variables at time t Yt Target variable at time t P(Xt) Power set of Xt Mt Set of MIS sets at time t Xs,ts-th intervention set at time t In this section we give the proof for Theorem 1 in the main text. This means that W includes those variables that are parents of Yt but are nor target at previous time steps nor intervened variables. In the following proof the values of IV0:t 1, XPYs,t, IPY0:t 1 and W are denoted by i, xPY, iPY and w respectively. Finally, fYY and fNYYare the functions in the SCM for Yt (see Assumptions (1) in the main text). Eq. (2) follows from the Eq. Finally, noticing that p(yPTt |I0:t 1) is the distribution targeted when optimizing the objective function at previous time steps one can obtain Eq. (6). The derivations above show how the objective function at time t is given by the expected value of the output of the functional relationship fNYYwhere the expectation is taken with respect to the variables that are not intervened on. This expectation is then shifted to account for the interventions implemented in the system at previous time steps that are affecting the target variable through fYY .
Kernel Identification Through Transformers
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models, as the chosen kernel determines both the inductive biases and prior support of functions under the GP prior. This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models. Drawing inspiration from recent progress in deep learning, we introduce a novel approach named KITT: Kernel Identification Through Transformers. KITT exploits a transformer-based architecture to generate kernel recommendations in under 0.1 seconds, which is several orders of magnitude faster than conventional kernel search algorithms. We train our model using synthetic data generated from priors over a vocabulary of known kernels. By exploiting the nature of the selfattention mechanism, KITT is able to process datasets with inputs of arbitrary dimension. We demonstrate that kernels chosen by KITT yield strong performance over a diverse collection of regression benchmarks.
Scalable Quasi-Bayesian Inference for Instrumental Variable Regression
Recent years have witnessed an upsurge of interest in employing flexible machine learning models for instrumental variable (IV) regression, but the development of uncertainty quantification methodology is still lacking. In this work we present a scalable quasi-Bayesian procedure for IV regression, building upon the recently developed kernelized IV models. Contrary to Bayesian modeling for IV, our approach does not require additional assumptions on the data generating process, and leads to a scalable approximate inference algorithm with time cost comparable to the corresponding point estimation methods. Our algorithm can be further extended to work with neural network models. We analyze the theoretical properties of the proposed quasi-posterior, and demonstrate through empirical evaluation the competitive performance of our method.