Bayesian Inference
Notes on computational-to-statistical gaps: predictions using statistical physics
Bandeira, Afonso S., Perry, Amelia, Wein, Alexander S.
In these notes we describe heuristics to predict computational-to-statistical gaps in certain statistical problems. These are regimes in which the underlying statistical problem is information-theoretically possible although no efficient algorithm exists, rendering the problem essentially unsolvable for large instances. The methods we describe here are based on mature, albeit non-rigorous, tools from statistical physics. These notes are based on a lecture series given by the authors at the Courant Institute of Mathematical Sciences in New York City, on May 16th, 2017.
Copula Variational Bayes inference via information geometry
Variational Bayes (VB), also known as independent mean-field approximation, has become a popular method for Bayesian network inference in recent years. Its application is vast, e.g. in neural network, compressed sensing, clustering, etc. to name just a few. In this paper, the independence constraint in VB will be relaxed to a conditional constraint class, called copula in statistics. Since a joint probability distribution always belongs to a copula class, the novel copula VB (CVB) approximation is a generalized form of VB. Via information geometry, we will see that CVB algorithm iteratively projects the original joint distribution to a copula constraint space until it reaches a local minimum Kullback-Leibler (KL) divergence. By this way, all mean-field approximations, e.g. iterative VB, Expectation-Maximization (EM), Iterated Conditional Mode (ICM) and k-means algorithms, are special cases of CVB approximation. For a generic Bayesian network, an augmented hierarchy form of CVB will also be designed. While mean-field algorithms can only return a locally optimal approximation for a correlated network, the augmented CVB network, which is an optimally weighted average of a mixture of simpler network structures, can potentially achieve the globally optimal approximation for the first time. Via simulations of Gaussian mixture clustering, the classification's accuracy of CVB will be shown to be far superior to that of state-of-the-art VB, EM and k-means algorithms.
Bayesian model and dimension reduction for uncertainty propagation: applications in random media
Grigo, Constantin, Koutsourelakis, Phaedon-Stelios
Well-established methods for the solution of stochastic partial differential equations (SPDEs) typically struggle in problems with high-dimensional inputs/outputs. Such difficulties are only amplified in large-scale applications where even a few tens of full-order model runs are impracticable. While dimensionality reduction can alleviate some of these issues, it is not known which and how many features of the (high-dimensional) input are actually predictive of the (high-dimensional) output. In this paper, we advocate a Bayesian formulation that is capable of performing simultaneous dimension and model-order reduction. It consists of a component that encodes the high-dimensional input into a low-dimensional set of feature functions by employing sparsity-enforcing priors and a decoding component that makes use of the solution of a coarse-grained model in order to reconstruct that of the full-order model. Both components are represented with latent variables in a probabilistic graphical model and are simultaneously trained using Stochastic Variational Inference methods. The model is capable of quantifying the predictive uncertainty due to the information loss that unavoidably takes place in any model-order/dimension reduction as well as the uncertainty arising from finite-sized training datasets. We demonstrate its capabilities in the context of random media where fine-scale fluctuations can give rise to random inputs with tens of thousands of variables. With a few tens of full-order model simulations, the proposed model is capable of identifying salient physical features and produce sharp predictions under different boundary conditions of the full output which itself consists of thousands of components.
Feed-forward Uncertainty Propagation in Belief and Neural Networks
Shekhovtsov, Alexander, Flach, Boris, Busta, Michal
We propose a feed-forward inference method applicable to belief and neural networks. In a belief network, the method estimates an approximate factorized posterior of all hidden units given the input. In neural networks the method propagates uncertainty of the input through all the layers. In neural networks with injected noise, the method analytically takes into account uncertainties resulting from this noise. Such feed-forward analytic propagation is differentiable in parameters and can be trained end-to-end. Compared to standard NN, which can be viewed as propagating only the means, we propagate the mean and variance. The method can be useful in all scenarios that require knowledge of the neuron statistics, e.g. when dealing with uncertain inputs, considering sigmoid activations as probabilities of Bernoulli units, training the models regularized by injected noise (dropout) or estimating activation statistics over the dataset (as needed for normalization methods). In the experiments we show the possible utility of the method in all these tasks as well as its current limitations.
Pseudo-marginal Bayesian inference for supervised Gaussian process latent variable models
Gadd, Charles, Wade, Sara, Shah, Akeel, Grammatopoulos, Dimitris
We introduce a Bayesian framework for inference with a supervised version of the Gaussian process latent variable model. The framework overcomes the high correlations between latent variables and hyperparameters by using an unbiased pseudo estimate for the marginal likelihood that approximately integrates over the latent variables. This is used to construct a Markov Chain to explore the posterior of the hyperparameters. We demonstrate the procedure on simulated and real examples, showing its ability to capture uncertainty and multimodality of the hyperparameters and improved uncertainty quantification in predictions when compared with variational inference.
Safe end-to-end imitation learning for model predictive control
Lee, Keuntaek, Saigol, Kamil, Theodorou, Evangelos
Abstract-- We propose the use of Bayesian networks, which provide both a mean value and an uncertainty estimate as output, to enhance the safety of learned control policies under circumstances in which a test-time input differs significantly from the training set. Our algorithm combines reinforcement learning and end-to-end imitation learning to simultaneously learn a control policy as well as a threshold over the predictive uncertainty of the learned model, with no hand-tuning required. Corrective action, such as a return of control to the model predictive controller or human expert, is taken when the uncertainty threshold is exceeded. We demonstrate that our method is robust to uncertainty resulting from varying system dynamics as well as from partial state observability. As the deployment of deep neural networks as controllers for physical robotic systems becomes more prevalent, the issue of safety within artificial intelligence becomes an increasingly important concern. Recently the use of end-to-end imitation learning to develop neural network control policies has surged in popularity, due in large part to the ease with which deep models can learn complex dynamics and infer global state from local data while bypassing the need for significant parameter tuning. In contrast, traditional approaches to vision-based control rely on methods such image segmentation and object detection, classification, labeling, and filtering; often, these methods require significant engineering and tuning.
Rectified Gaussian Scale Mixtures and the Sparse Non-Negative Least Squares Problem
Nalci, Alican, Fedorov, Igor, Al-Shoukairi, Maher, Liu, Thomas T., Rao, Bhaskar D.
In this paper, we develop a Bayesian evidence maximization framework to solve the sparse non-negative least squares (S-NNLS) problem. We introduce a family of probability densities referred to as the Rectified Gaussian Scale Mixture (R- GSM) to model the sparsity enforcing prior distribution for the solution. The R-GSM prior encompasses a variety of heavy-tailed densities such as the rectified Laplacian and rectified Student- t distributions with a proper choice of the mixing density. We utilize the hierarchical representation induced by the R-GSM prior and develop an evidence maximization framework based on the Expectation-Maximization (EM) algorithm. Using the EM based method, we estimate the hyper-parameters and obtain a point estimate for the solution. We refer to the proposed method as rectified sparse Bayesian learning (R-SBL). We provide four R- SBL variants that offer a range of options for computational complexity and the quality of the E-step computation. These methods include the Markov chain Monte Carlo EM, linear minimum mean-square-error estimation, approximate message passing and a diagonal approximation. Using numerical experiments, we show that the proposed R-SBL method outperforms existing S-NNLS solvers in terms of both signal and support recovery performance, and is also very robust against the structure of the design matrix.
Kinetic Compressive Sensing
Scipioni, Michele, Santarelli, Maria F., Landini, Luigi, Catana, Ciprian, Greve, Douglas N., Price, Julie C., Pedemonte, Stefano
Parametric images provide insight into the spatial distribution of physiological parameters, but they are often extremely noisy, due to low SNR of tomographic data. Direct estimation from projections allows accurate noise modeling, improving the results of post-reconstruction fitting. We propose a method, which we name kinetic compressive sensing (KCS), based on a hierarchical Bayesian model and on a novel reconstruction algorithm, that encodes sparsity of kinetic parameters. Parametric maps are reconstructed by maximizing the joint probability, with an Iterated Conditional Modes (ICM) approach, alternating the optimization of activity time series (OS-MAP-OSL), and kinetic parameters (MAP-LM). We evaluated the proposed algorithm on a simulated dynamic phantom: a bias/variance study confirmed how direct estimates can improve the quality of parametric maps over a post-reconstruction fitting, and showed how the novel sparsity prior can further reduce their variance, without affecting bias. Real FDG PET human brain data (Siemens mMR, 40min) images were also processed. Results enforced how the proposed KCS-regularized direct method can produce spatially coherent images and parametric maps, with lower spatial noise and better tissue contrast. A GPU-based open source implementation of the algorithm is provided.
MLE-induced Likelihood for Markov Random Fields
Due to the intractable partition function, the exact likelihood function for a Markov random field (MRF), in many situations, can only be approximated. Major approximation approaches include pseudolikelihood and Laplace approximation. In this paper, we propose a novel way of approximating the likelihood function through first approximating the marginal likelihood functions of individual parameters and then reconstructing the joint likelihood function from these marginal likelihood functions. For approximating the marginal likelihood functions, we derive a particular likelihood function from a modified scenario of coin tossing which is useful for capturing how one parameter interacts with the remaining parameters in the likelihood function. For reconstructing the joint likelihood function, we use an appropriate copula to link up these marginal likelihood functions. Numerical investigation suggests the superior performance of our approach. Especially as the size of the MRF increases, both the numerical performance and the computational cost of our approach remain consistently satisfactory, whereas Laplace approximation deteriorates and pseudolikelihood becomes computationally unbearable.