Bayesian Inference
Graphical model inference: Sequential Monte Carlo meets deterministic approximations
Lindsten, Fredrik, Helske, Jouni, Vihola, Matti
Approximate inference in probabilistic graphical models (PGMs) can be grouped into deterministic methods and Monte-Carlo-based methods. The former can often provide accurate and rapid inferences, but are typically associated with biases that are hard to quantify. The latter enjoy asymptotic consistency, but can suffer from high computational costs. In this paper we present a way of bridging the gap between deterministic and stochastic inference. Specifically, we suggest an efficient sequential Monte Carlo (SMC) algorithm for PGMs which can leverage the output from deterministic inference methods. While generally applicable, we show explicitly how this can be done with loopy belief propagation, expectation propagation, and Laplace approximations. The resulting algorithm can be viewed as a post-correction of the biases associated with these methods and, indeed, numerical results show clear improvements over the baseline deterministic methods as well as over "plain" SMC.
Uncertainty-Based Out-of-Distribution Detection in Deep Reinforcement Learning
Sedlmeier, Andreas, Gabor, Thomas, Phan, Thomy, Belzner, Lenz, Linnhoff-Popien, Claudia
We consider the problem of detecting out-of-distribution (OOD) samples in deep reinforcement learning. In a value based reinforcement learning setting, we propose to use uncertainty estimation techniques directly on the agent's value estimating neural network to detect OOD samples. The focus of our work lies in analyzing the suitability of approximate Bayesian inference methods and related ensembling techniques that generate uncertainty estimates. Although prior work has shown that dropout-based variational inference techniques and bootstrap-based approaches can be used to model epistemic uncertainty, the suitability for detecting OOD samples in deep reinforcement learning remains an open question. Our results show that uncertainty estimation can be used to differentiate in- from out-of-distribution samples. Over the complete training process of the reinforcement learning agents, bootstrap-based approaches tend to produce more reliable epistemic uncertainty estimates, when compared to dropout-based approaches.
A Comprehensive guide to Bayesian Convolutional Neural Network with Variational Inference
Shridhar, Kumar, Laumann, Felix, Liwicki, Marcus
Artificial Neural Networks are connectionist systems that perform a given task by learning on examples without having prior knowledge about the task. This is done by finding an optimal point estimate for the weights in every node. Generally, the network using point estimates as weights perform well with large datasets, but they fail to express uncertainty in regions with little or no data, leading to overconfident decisions. In this paper, Bayesian Convolutional Neural Network (BayesCNN) using Variational Inference is proposed, that introduces probability distribution over the weights. Furthermore, the proposed BayesCNN architecture is applied to tasks like Image Classification, Image Super-Resolution and Generative Adversarial Networks. The results are compared to point-estimates based architectures on MNIST, CIFAR-10 and CIFAR-100 datasets for Image CLassification task, on BSD300 dataset for Image Super Resolution task and on CIFAR10 dataset again for Generative Adversarial Network task. BayesCNN is based on Bayes by Backprop which derives a variational approximation to the true posterior. We, therefore, introduce the idea of applying two convolutional operations, one for the mean and one for the variance. Our proposed method not only achieves performances equivalent to frequentist inference in identical architectures but also incorporate a measurement for uncertainties and regularisation. It further eliminates the use of dropout in the model. Moreover, we predict how certain the model prediction is based on the epistemic and aleatoric uncertainties and empirically show how the uncertainty can decrease, allowing the decisions made by the network to become more deterministic as the training accuracy increases. Finally, we propose ways to prune the Bayesian architecture and to make it more computational and time effective.
Credit Assignment Techniques in Stochastic Computation Graphs
Weber, Thรฉophane, Heess, Nicolas, Buesing, Lars, Silver, David
Stochastic computation graphs (SCGs) provide a formalism to represent structured optimization problems arising in artificial intelligence, including supervised, unsupervised, and reinforcement learning. Previous work has shown that an unbiased estimator of the gradient of the expected loss of SCGs can be derived from a single principle. However, this estimator often has high variance and requires a full model evaluation per data point, making this algorithm costly in large graphs. In this work, we address these problems by generalizing concepts from the reinforcement learning literature. We introduce the concepts of value functions, baselines and critics for arbitrary SCGs, and show how to use them to derive lower-variance gradient estimates from partial model evaluations, paving the way towards general and efficient credit assignment for gradient-based optimization. In doing so, we demonstrate how our results unify recent advances in the probabilistic inference and reinforcement learning literature.
Imputation and low-rank estimation with Missing Non At Random data
Sportisse, Aude, Boyer, Claire, Josse, Julie
Preprint submitted to January 8, 2019 the use of Expectation-Maximization (EM) algorithm [8] which allows to get the maximum likelihood estimators in various incomplete-data problems [21]. The theoretical guarantees of these methods ensuring the correct prediction of missing values or the correct estimation of some parameters of interest are only valid if some assumptions are made on how the data came to be missing. Rubin [31] introduced three types of missing-data mechanisms: (i) the restrictive assumptions of missing completely at random (MCAR) data, (ii) the missing at random (MAR) data, where the missing data may only depend on the observable variables, and (iii) the more general assumption of missing not at random (MNAR) data, i.e. when the unavailability of the data depends on the values of other variables and its own value. A classic example of MNAR data, which is the focus of the paper, is surveys where rich people would be less willing to disclose their income or where people would be less incline to answer sensitive questions on their addictive use. Another example would be the diagnosis of Alzheimer's disease, which can be made using a score obtained by the patient on a specific test. However, when a patient has the disease, he or she has difficulty answering questions and is more likely to abandon the test before it ends.
Causality and Bayesian network PDEs for multiscale representations of porous media
Um, Kimoon, Hall, Eric Joseph, Katsoulakis, Markos A., Tartakovsky, Daniel M.
Microscopic (pore-scale) properties of porous media affect and often determine their macroscopic (continuum- or Darcy-scale) counterparts. Understanding the relationship between processes on these two scales is essential to both the derivation of macroscopic models of, e.g., transport phenomena in natural porous media, and the design of novel materials, e.g., for energy storage. Most microscopic properties exhibit complex statistical correlations and geometric constraints, which presents challenges for the estimation of macroscopic quantities of interest (QoIs), e.g., in the context of global sensitivity analysis (GSA) of macroscopic QoIs with respect to microscopic material properties. We present a systematic way of building correlations into stochastic multiscale models through Bayesian networks. This allows us to construct the joint probability density function (PDF) of model parameters through causal relationships that emulate engineering processes, e.g., the design of hierarchical nanoporous materials. Such PDFs also serve as input for the forward propagation of parametric uncertainty; our findings indicate that the inclusion of causal relationships impacts predictions of macroscopic QoIs. To assess the impact of correlations and causal relationships between microscopic parameters on macroscopic material properties, we use a moment-independent GSA based on the differential mutual information. Our GSA accounts for the correlated inputs and complex non-Gaussian QoIs. The global sensitivity indices are used to rank the effect of uncertainty in microscopic parameters on macroscopic QoIs, to quantify the impact of causality on the multiscale model's predictions, and to provide physical interpretations of these results for hierarchical nanoporous materials.
Fast Multi-Class Probabilistic Classifier by Sparse Non-parametric Density Estimation
Chen, Wan-Ping Nicole, Chang, Yuan-chin Ivan
The model interpretation is essential in many application scenarios and to build a classification model with a ease of model interpretation may provide useful information for further studies and improvement. It is common to encounter with a lengthy set of variables in modern data analysis, especially when data are collected in some automatic ways. This kinds of datasets may not collected with a specific analysis target and usually contains redundant features, which have no contribution to a the current analysis task of interest. Variable selection is a common way to increase the ability of model interpretation and is popularly used with some parametric classification models. There is a lack of studies about variable selection in nonparametric classification models such as the density estimation-based methods and this is especially the case for multiple-class classification situations. In this study we study multiple-class classification problems using the thought of sparse non-parametric density estimation and propose a method for identifying high impacts variables for each class. We present the asymptotic properties and the computation procedure for the proposed method together with some suggested sample size. We also repost the numerical results using both synthesized and some real data sets.
Improved and Scalable Online Learning of Spatial Concepts and Language Models with Mapping
Taniguchi, Akira, Hagiwara, Yoshinobu, Taniguchi, Tadahiro, Inamura, Tetsunari
We propose a novel online learning algorithm, called SpCoSLAM 2.0, for spatial concepts and lexical acquisition with high accuracy and scalability. Previously, we proposed SpCoSLAM as an online learning algorithm based on unsupervised Bayesian probabilistic model that integrates multimodal place categorization, lexical acquisition, and SLAM. However, our previous algorithm had limited estimation accuracy owing to the influence of the early stages of learning, and increased computational complexity with added training data. Therefore, we introduce techniques such as fixed-lag rejuvenation to reduce the calculation time while maintaining an accuracy higher than that of the previous algorithm. The results show that, in terms of estimation accuracy, the proposed algorithm exceeds the previous algorithm and is comparable to batch learning. In addition, the calculation time of the proposed algorithm does not depend on the amount of training data and becomes constant for each step of the scalable algorithm. Our approach will contribute to the realization of long-term spatial language interactions between humans and robots.
Machine learning in 10 pictures
I find myself coming back to the same few pictures when explaining basic machine learning concepts. Below is a list I find most illuminating. Plots of polynomials having various orders M, shown as red curves, fitted to the data set generated by the green curve. Why Bayesian inference embodies Occam's razor. This figure gives the basic intuition for why complex models can turn out to be less probable.
Prediction of multi-dimensional spatial variation data via Bayesian tensor completion
This paper presents a multi-dimensional computational method to predict the spatial variation data inside and across multiple dies of a wafer. This technique is based on tensor computation. A tensor is a high-dimensional generalization of a matrix or a vector. By exploiting the hidden low-rank property of a high-dimensional data array, the large amount of unknown variation testing data may be predicted from a few random measurement samples. The tensor rank, which decides the complexity of a tensor representation, is decided by an available variational Bayesian approach. Our approach is validated by a practical chip testing data set, and it can be easily generalized to characterize the process variations of multiple wafers. Our approach is more efficient than the previous virtual probe techniques in terms of memory and computational cost when handling high-dimensional chip testing data.