Bayesian Inference
Algebraic Equivalence of Linear Structural Equation Models
van Ommen, Thijs, Mooij, Joris M.
Despite their popularity, many questions about the algebraic constraints imposed by linear structural equation models remain open problems. For causal discovery, two of these problems are especially important: the enumeration of the constraints imposed by a model, and deciding whether two graphs define the same statistical model. We show how the half-trek criterion can be used to make progress in both of these problems. We apply our theoretical results to a small-scale model selection problem, and find that taking the additional algebraic constraints into account may lead to significant improvements in model selection accuracy.
A Hierarchical Bayesian Linear Regression Model with Local Features for Stochastic Dynamics Approximation
Parsa, Behnoosh, Rajasekaran, Keshav, Meier, Franziska, Banerjee, Ashis G.
One of the challenges in model-based control of stochastic dynamical systems is that the state transition dynamics are involved, and it is not easy or efficient to make good-quality predictions of the states. Moreover, there are not many representational models for the majority of autonomous systems, as it is not easy to build a compact model that captures the entire dynamical subtleties and uncertainties. In this work, we present a hierarchical Bayesian linear regression model with local features to learn the dynamics of a micro-robotic system as well as two simpler examples, consisting of a stochastic mass-spring damper and a stochastic double inverted pendulum on a cart. The model is hierarchical since we assume non-stationary priors for the model parameters. These non-stationary priors make the model more flexible by imposing priors on the priors of the model. To solve the maximum likelihood (ML) problem for this hierarchical model, we use the variational expectation maximization (EM) algorithm, and enhance the procedure by introducing hidden target variables. The algorithm yields parsimonious model structures, and consistently provides fast and accurate predictions for all our examples involving large training and test sets. This demonstrates the effectiveness of the method in learning stochastic dynamics, which makes it suitable for future use in a paradigm, such as model-based reinforcement learning, to compute optimal control policies in real time.
Quantification under prior probability shift: the ratio estimator and its extensions
Vaz, Afonso Fernandes, Izbicki, Rafael, Stern, Rafael Bassi
The quantification problem consists of determining the prevalence of a given label in a target population. However, one often has access to the labels in a sample from the training population but not in the target population. A common assumption in this situation is that of prior probability shift, that is, once the labels are known, the distribution of the features is the same in the training and target populations. In this paper, we derive a new lower bound for the risk of the quantification problem under the prior shift assumption. Complementing this lower bound, we present a new approximately minimax class of estimators, ratio estimators, which generalize several previous proposals in the literature. Using a weaker version of the prior shift assumption, which can be tested, we show that ratio estimators can be used to build confidence intervals for the quantification problem. We also extend the ratio estimator so that it can: (i) incorporate labels from the target population, when they are available and (ii) estimate how the prevalence of positive labels varies according to a function of certain covariates.
Constraint-based Causal Discovery for Non-Linear Structural Causal Models with Cycles and Latent Confounders
Forrรฉ, Patrick, Mooij, Joris M.
We address the problem of causal discovery from data, making use of the recently proposed causal modeling framework of modular structural causal models (mSCM) to handle cycles, latent confounders and non-linearities. We introduce {\sigma}-connection graphs ({\sigma}-CG), a new class of mixed graphs (containing undirected, bidirected and directed edges) with additional structure, and extend the concept of {\sigma}-separation, the appropriate generalization of the well-known notion of d-separation in this setting, to apply to {\sigma}-CGs. We prove the closedness of {\sigma}-separation under marginalisation and conditioning and exploit this to implement a test of {\sigma}-separation on a {\sigma}-CG. This then leads us to the first causal discovery algorithm that can handle non-linear functional relations, latent confounders, cyclic causal relationships, and data from different (stochastic) perfect interventions. As a proof of concept, we show on synthetic data how well the algorithm recovers features of the causal graph of modular structural causal models.
Sampling and Inference for Beta Neutral-to-the-Left Models of Sparse Networks
Bloem-Reddy, Benjamin, Foster, Adam, Mathieu, Emile, Teh, Yee Whye
Empirical evidence suggests that heavy-tailed degree distributions occurring in many real networks are well-approximated by power laws with exponents $\eta$ that may take values either less than and greater than two. Models based on various forms of exchangeability are able to capture power laws with $\eta < 2$, and admit tractable inference algorithms; we draw on previous results to show that $\eta > 2$ cannot be generated by the forms of exchangeability used in existing random graph models. Preferential attachment models generate power law exponents greater than two, but have been of limited use as statistical models due to the inherent difficulty of performing inference in non-exchangeable models. Motivated by this gap, we design and implement inference algorithms for a recently proposed class of models that generates $\eta$ of all possible values. We show that although they are not exchangeable, these models have probabilistic structure amenable to inference. Our methods make a large class of previously intractable models useful for statistical inference.
Pairwise Covariates-adjusted Block Model for Community Detection
One of the most fundamental problems in network study is community detection. The stochastic block model (SBM) is one widely used model for network data with different estimation methods developed with their community detection consistency results unveiled. However, the SBM is restricted by the strong assumption that all nodes in the same community are stochastically equivalent, which may not be suitable for practical applications. We introduce pairwise covariates-adjusted stochastic block model (PCABM), a generalization of SBM that incorporates pairwise covariate information. We study the maximum likelihood estimates of the coefficients for the covariates as well as the community assignments. It is shown that both the coefficient estimates of the covariates and the community assignments are consistent under suitable sparsity conditions. Spectral clustering with adjustment (SCWA) is introduced to efficiently solve PCABM. Under certain conditions, we derive the error bound of community estimation under SCWA and show that it is community detection consistent. PCABM compares favorably with the SBM or degree-corrected stochastic block model (DCBM) under a wide range of simulated and real networks when covariate information is accessible.
Quantifying model form uncertainty in Reynolds-averaged turbulence models with Bayesian deep neural networks
Geneva, Nicholas, Zabaras, Nicholas
Data-driven methods for improving turbulence modeling in Reynolds-Averaged Navier-Stokes (RANS) simulations have gained significant interest in the computational fluid dynamics community. Modern machine learning models have opened up a new area of black-box turbulence models allowing for the tuning of RANS simulations to increase their predictive accuracy. While several data-driven turbulence models have been reported, the quantification of the uncertainties introduced has mostly been neglected. Uncertainty quantification for such data-driven models is essential since their predictive capability rapidly declines as they are tested for flow physics that deviate from that in the training data. In this work, we propose a novel data-driven framework that not only improves RANS predictions but also provides probabilistic bounds for fluid quantities such as velocity and pressure. The uncertainties capture include both model form uncertainty as well as epistemic uncertainty induced by the limited training data. An invariant Bayesian deep neural network is used to predict the anisotropic tensor component of the Reynolds stress. This model is trained using Stein's variational gradient decent algorithm. The computed uncertainty on the Reynolds stress is propagated to the quantities of interest by vanilla Monte Carlo simulation. Results are presented for two test cases that differ geometrically from the training flows at several different Reynolds numbers. The prediction enhancement of the data-driven model is discussed as well as the associated probabilistic bounds for flow properties of interest. Ultimately this framework allows for a quantitative measurement of model confidence and uncertainty quantification for flows in which no high-fidelity observations or prior knowledge is available.
The modal age of Statistics
The mean-median-mode trio involves the three most frequently used measures of central tendency of a dataset. They are taught within the very first classes of any course on basic Statistics. However, they do not share the same degree of importance: the sample mean (or average) is normally well understood and employed in everyday situations, the sample median is also useful and easy to visualize, but the mode, usually defined as the value of the dataset having the highest frequency of appearance, looks like a more bizarre measure of location. This uneven treatment was already noted by Dalenius (1965), but it keeps being present as of today, to some extent. Indeed, when the dataset consists of realizations from a continuous random variable then all the observed values are different with probability one and, therefore, the mode does not even make much sense.
A Tutorial on Bayesian Optimization
Bayesian optimization is an approach to optimizing objective functions that take a long time (minutes or hours) to evaluate. It is best-suited for optimization over continuous domains of less than 20 dimensions, and tolerates stochastic noise in function evaluations. It builds a surrogate for the objective and quantifies the uncertainty in that surrogate using a Bayesian machine learning technique, Gaussian process regression, and then uses an acquisition function defined from this surrogate to decide where to sample. In this tutorial, we describe how Bayesian optimization works, including Gaussian process regression and three common acquisition functions: expected improvement, entropy search, and knowledge gradient. We then discuss more advanced techniques, including running multiple function evaluations in parallel, multi-fidelity and multi-information source optimization, expensive-to-evaluate constraints, random environmental conditions, multi-task Bayesian optimization, and the inclusion of derivative information. We conclude with a discussion of Bayesian optimization software and future research directions in the field. Within our tutorial material we provide a generalization of expected improvement to noisy evaluations, beyond the noise-free setting where it is more commonly applied. This generalization is justified by a formal decision-theoretic argument, standing in contrast to previous ad hoc modifications.
BALSON: Bayesian Least Squares Optimization with Nonnegative L1-Norm Constraint
Xie, Jiyang, Ma, Zhanyu, Zhang, Guoqiang, Xue, Jing-Hao, Chien, Jen-Tzung, Lin, Zhiqing, Guo, Jun
A Bayesian approach termed BAyesian Least Squares Optimization with Nonnegative L1-norm constraint (BALSON) is proposed. The error distribution of data fitting is described by Gaussian likelihood. The parameter distribution is assumed to be a Dirichlet distribution. With the Bayes rule, searching for the optimal parameters is equivalent to finding the mode of the posterior distribution. In order to explicitly characterize the nonnegative L1-norm constraint of the parameters, we further approximate the true posterior distribution by a Dirichlet distribution. We estimate the statistics of the approximating Dirichlet posterior distribution by sampling methods. Four sampling methods have been introduced. With the estimated posterior distributions, the original parameters can be effectively reconstructed in polynomial fitting problems, and the BALSON framework is found to perform better than conventional methods.