Goto

Collaborating Authors

 Bayesian Inference


Black-Box Saliency Map Generation Using Bayesian Optimisation

arXiv.org Artificial Intelligence

Anna Sergeevna Bosman Department of Computer Science University of Pretoria Pretoria, South Africa anna.bosman@up.ac.za Abstract --Saliency maps are often used in computer vision to provide intuitive interpretations of what input regions a model has used to produce a specific prediction. A number of approaches to saliency map generation are available, but most require access to model parameters. This work proposes an approach for saliency map generation for black-box models, where no access to model parameters is available, using a Bayesian optimisation sampling method. The approach aims to find the global salient image region responsible for a particular (black-box) model's prediction. This is achieved by a sampling-based approach to model perturbations that seeks to localise salient regions of an image to the black-box model. Results show that the proposed approach to saliency map generation outperforms grid-based perturbation approaches, and performs similarly to gradient-based approaches which require access to model parameters. I NTRODUCTION Deep learning (DL) techniques have become a standard approach in computer vision. Specifically, the convolutional neural network (CNN) architecture has shown exceptional performance, achieving results comparable to human performance on image recognition tasks [1]-[3]. As a result, the CNN models are often deployed in real life as efficient black-box tools.


Towards a Kernel based Physical Interpretation of Model Uncertainty

arXiv.org Machine Learning

This paper introduces a new information theoretic framework that provides a sensitive multi-modal quantification of data uncertainty by imposing a quantum physical description of its metric space. We specifically work with the kernel mean embedding metric which, apart from rendering a statistically rich data-induced representation of the signal's PDF in the RKHS, yields an intuitive physical interpretation of the signal as a potential field, resulting in its new energy based formulation. This enables one to extract multi-scale uncertainty features of data in the form of information eigenmodes by utilizing moment decomposition concepts of quantum physics. In essence, we decompose local realizations of the signal's PDF in terms of quantum uncertainty moments. Owing to its kernel basis and multi-modal nature, we postulate that such a framework would serve as a powerful surrogate tool for quantifying model uncertainty. We therefore specifically present the application of this framework as a non-parametric and non-intrusive surrogate tool for predictive uncertainty quantification of point-prediction neural network models, overcoming various limitations of conventional Bayesian and ensemble based UQ methods. Experimental comparisons with some established uncertainty quantification methods illustrate performance advantages exhibited by our framework.


Maximum likelihood estimation and uncertainty quantification for Gaussian process approximation of deterministic functions

arXiv.org Machine Learning

Despite the ubiquity of the Gaussian process regression model, few theoretical results are available that account for the fact that parameters of the covariance kernel typically need to be estimated from the dataset. This article provides one of the first theoretical analyses in the context of Gaussian process regression with a noiseless dataset. Specifically, we consider the scenario where the scale parameter of a Sobolev kernel (such as a Mat\'ern kernel) is estimated by maximum likelihood. We show that the maximum likelihood estimation of the scale parameter alone provides significant adaptation against misspecification of the Gaussian process model in the sense that the model can become "slowly" overconfident at worst, regardless of the difference between the smoothness of the data-generating function and that expected by the model. The analysis is based on a combination of techniques from nonparametric regression and scattered data interpolation. Empirical results are provided in support of the theoretical findings.


The Case for Bayesian Deep Learning

arXiv.org Machine Learning

The key distinguishing property of a Bayesian approach is marginalization instead of optimization, not the prior, or Bayes rule. Bayesian inference is especially compelling for deep neural networks. (1) Neural networks are typically underspecified by the data, and can represent many different but high performing models corresponding to different settings of parameters, which is exactly when marginalization will make the biggest difference for both calibration and accuracy. (2) Deep ensembles have been mistaken as competing approaches to Bayesian methods, but can be seen as approximate Bayesian marginalization. (3) The structure of neural networks gives rise to a structured prior in function space, which reflects the inductive biases of neural networks that help them generalize. (4) The observed correlation between parameters in flat regions of the loss and a diversity of solutions that provide good generalization is further conducive to Bayesian marginalization, as flat regions occupy a large volume in a high dimensional space, and each different solution will make a good contribution to a Bayesian model average. (5) Recent practical advances for Bayesian deep learning provide improvements in accuracy and calibration compared to standard training, while retaining scalability.


Bayesian Reasoning with Deep-Learned Knowledge

arXiv.org Artificial Intelligence

We access the internalized understanding of trained, deep neural networks to perform Bayesian reasoning on complex tasks. Independently trained networks are arranged to jointly answer questions outside their original scope, which are formulated in terms of a Bayesian inference problem. We solve this approximately with variational inference, which provides uncertainty on the outcomes. We demonstrate how following tasks can be approached this way: Combining independently trained networks to sample from a conditional generator, solving riddles involving multiple constraints simultaneously, and combine deep-learned knowledge with conventional noisy measurements in the context of high-resolution images of human faces.


Interventions and Counterfactuals in Tractable Probabilistic Models: Limitations of Contemporary Transformations

arXiv.org Artificial Intelligence

In recent years, there has been an increasing interest in studying causality-related properties in machine learning models generally, and in generative models in particular. While that is well motivated, it inherits the fundamental computational hardness of probabilistic inference, making exact reasoning intractable. Probabilistic tractable models have also recently emerged, which guarantee that conditional marginals can be computed in time linear in the size of the model, where the model is usually learned from data. Although initially limited to low tree-width models, recent tractable models such as sum product networks (SPNs) and probabilistic sentential decision diagrams (PSDDs) exploit efficient function representations and also capture high tree-width models. In this paper, we ask the following technical question: can we use the distributions represented or learned by these models to perform causal queries, such as reasoning about interventions and counterfactuals? By appealing to some existing ideas on transforming such models to Bayesian networks, we answer mostly in the negative. We show that when transforming SPNs to a causal graph interventional reasoning reduces to computing marginal distributions; in other words, only trivial causal reasoning is possible. For PSDDs the situation is only slightly better. We first provide an algorithm for constructing a causal graph from a PSDD, which introduces augmented variables. Intervening on the original variables, once again, reduces to marginal distributions, but when intervening on the augmented variables, a deterministic but nonetheless causal-semantics can be provided for PSDDs.


Tutorial #5: variational autoencoders

#artificialintelligence

The goal of the variational autoencoder (VAE) is to learn a probability distribution $Pr(\mathbf{x})$ over a multi-dimensional variable $\mathbf{x}$. There are two main reasons for modelling distributions. First, we might want to draw samples (generate) from the distribution to create new plausible values of $\mathbf{x}$. Second, we might want to measure the likelihood that a new vector $\mathbf{x} {*}$ was created by this probability distribution. In fact, it turns out that the variational autoencoder is well-suited to the former task but not for the latter. It is common to talk about the variational autoencoder as if it is the model of $Pr(\mathbf{x})$. However, this is misleading; the variational autoencoder is a neural architecture that is designed to help learn the model for $Pr(\mathbf{x})$.


Dynamic clustering of time series data

arXiv.org Machine Learning

We propose a new method for clustering multivariate time-series data based on Dynamic Linear Models. Whereas usual time-series clustering methods obtain static membership parameters, our proposal allows each time-series to dynamically change their cluster memberships over time. In this context, a mixture model is assumed for the time series and a flexible Dirichlet evolution for mixture weights allows for smooth membership changes over time. Posterior estimates and predictions can be obtained through Gibbs sampling, but a more efficient method for obtaining point estimates is presented, based on Stochastic Expectation-Maximization and Gradient Descent. Finally, two applications illustrate the usefulness of our proposed model to model both univariate and multivariate time-series: World Bank indicators for the renewable energy consumption of EU nations and the famous Gapminder dataset containing life-expectancy and GDP per capita for various countries.


The Indian Chefs Process

arXiv.org Machine Learning

This paper introduces the Indian Chefs Process (ICP), a Bayesian nonparametric prior on the joint space of infinite directed acyclic graphs (DAGs) and orders that generalizes Indian Buffet Processes. As our construction shows, the proposed distribution relies on a latent Beta Process controlling both the orders and outgoing connection probabilities of the nodes, and yields a probability distribution on sparse infinite graphs. The main advantage of the ICP over previously proposed Bayesian nonparametric priors for DAG structures is its greater flexibility. To the best of our knowledge, the ICP is the first Bayesian nonparametric model supporting every possible DAG. We demonstrate the usefulness of the ICP on learning the structure of deep generative sigmoid networks as well as convolutional neural networks.


December 2019: "Top 40" New R Packages

#artificialintelligence

One hundred fifty-two packages made it to CRAN in December. Here are my "Top 40" picks in ten categories: Data, Genomics, Machine Learning, Mathematics, Medicine, Science, Statistics, Time Series, Utilities, and Visualization. Look here for more information as well as the vignette. Loads and creates spatial data, including layers and tools that are relevant to the activities of the Commission for the Conservation of Antarctic Marine Living Resources ( CCAMLR). Have a look at the vignette.