Goto

Collaborating Authors

 Bayesian Learning


Batch simulations and uncertainty quantification in Gaussian process surrogate-based approximate Bayesian computation

arXiv.org Machine Learning

Surrogate models such as Gaussian processes (GP) have been proposed to accelerate approximate Bayesian computation (ABC) when the statistical model of interest is expensive-to-simulate. In one such promising framework the discrepancy between simulated and observed data is modelled with a GP. So far principled strategies have been proposed only for sequential selection of the simulation locations. To address this limitation, we develop Bayesian optimal design strategies to parallellise the expensive simulations. Current surrogate-based ABC methods also produce only a point estimate of the ABC posterior while there can be substantial additional uncertainty due to the limited budget of simulations. We also address the problem of quantifying the uncertainty of ABC posterior and discuss the connections between our resulting framework called Bayesian ABC, Bayesian quadrature (BQ) and Bayesian optimisation (BO). Experiments with several toy and real-world simulation models demonstrate advantages of the proposed techniques.


Optimal Clustering from Noisy Binary Feedback

arXiv.org Machine Learning

We study the problem of recovering clusters from binary user feedback. Items are grouped into initially unknown non-overlapping clusters. To recover these clusters, the learner sequentially presents to users a finite list of items together with a question with a binary answer selected from a fixed finite set. For each of these items, the user provides a random answer whose expectation is determined by the item cluster and the question and by an item-specific parameter characterizing the hardness of classifying the item. The objective is to devise an algorithm with a minimal cluster recovery error rate. We derive problem-specific information-theoretical lower bounds on the error rate satisfied by any algorithm, for both uniform and adaptive (list, question) selection strategies. For uniform selection, we present a simple algorithm built upon K-means whose performance almost matches the fundamental limits. For adaptive selection, we develop an adaptive algorithm that is inspired by the derivation of the information-theoretical error lower bounds, and in turn allocates the budget in an efficient way. The algorithm learns to select items hard to cluster and relevant questions more often. We compare numerically the performance of our algorithms with or without adaptive selection strategy, and illustrate the gain achieved by being adaptive. Our inference problems are motivated by the problem of solving large-scale labeling tasks with minimal effort put on the users. For example, in some of the recent CAPTCHA systems, users clicks (binary answers) can be used to efficiently label images, by optimally finding the best questions to present.


Evolving Gaussian Process kernels from elementary mathematical expressions

arXiv.org Machine Learning

Choosing the most adequate kernel is crucial in many Machine Learning applications. Gaussian Process is a state-of-the-art technique for regression and classification that heavily relies on a kernel function. However, in the Gaussian Process literature, kernels have usually been either ad hoc designed, selected from a predefined set, or searched for in a space of compositions of kernels which have been defined a priori. In this paper, we propose a Genetic-Programming algorithm that represents a kernel function as a tree of elementary mathematical expressions. By means of this representation, a wider set of kernels can be modeled, where potentially better solutions can be found, although new challenges also arise. The proposed algorithm is able to overcome these difficulties and find kernels that accurately model the characteristics of the data. This method has been tested in several real-world time-series extrapolation problems, improving the state-of-the-art results while reducing the complexity of the kernels.


Dealing with Stochasticity in Biological ODE Models

arXiv.org Machine Learning

Mathematical modeling with Ordinary Differential Equations (ODEs) has proven to be extremely successful in a variety of fields, including biology. However, these models are completely deterministic given a certain set of initial conditions. We convert mathematical ODE models of three benchmark biological systems to Dynamic Bayesian Networks (DBNs). The DBN model can handle model uncertainty and data uncertainty in a principled manner. They can be used for temporal data mining for noisy and missing variables. We apply Particle Filtering algorithm to infer the model variables by re-estimating the models parameters of various biological ODE models. The model parameters are automatically re-estimated using temporal evidence in the form of data streams. The results show that DBNs are capable of inferring the model variables of the ODE model with high accuracy in situations where data is missing, incomplete, sparse and irregular and true values of model parameters are not known.


Probability Theory 101 for Dummies like Me

#artificialintelligence

In the Classical interpretation Probability is the measure of the likelihood that an event will occur in a Random Experiment; In other words, the frequency of the event occurring. Probability is quantified as a number between 0 and 1, where, loosely speaking, 0 indicates impossibility and 1 indicates certainty. The higher the probability of an event, the more likely it is that the event will occur. A simple example is the tossing of a fair (unbiased) coin. Since the coin is fair, the two outcomes ("heads" and "tails") are both equally probable; the probability of "heads" equals the probability of "tails"; and since no other outcomes are possible, the probability of either "heads" or "tails" is 1/2 (which could also be written as 0.5 or 50%).


Element AI makes its BAyesian Active Learning library open source

#artificialintelligence

Element AI's BAyesian Active Learning library (BaaL library) is now open source and available on GitHub. In this article, we briefly describe active learning, its potential use with deep networks and the specific capabilities of our BaaL library. Machine learning applications generally require a huge amount of data, and in many cases, this data cannot be easily acquired. What's more, even when data is readily available, it often is not possible to label it efficiently. Active learning aims at reducing the amount of labelled data needed to train machine learning models.


Distributed Bayesian Computation for Model Choice

#artificialintelligence

We derive a general decomposition of the model evidence that allows an efficient divide-and-conquer calculation on every worker without accessing the data in one single place. The combination of the results requires only minimal communication between the workers and no exchange of data. We illustrate the applicability of our method on several challenging applications and show that the computation time is reduced by several orders of magnitude, incurring only a negligible bias. We show how to apply our approach in a reversible jump setting where an MCMC sampler moves between different models. The rest of our work is structured as follows: we discuss related work in Section 2 before presenting our approach on distributed Bayesian model choice in Section 3. In Section 4 we demonstrate the applicability of our approach on several data sets and models before discussing possible extensions in Section 5.


Variational Auto-encoder Based Bayesian Poisson Tensor Factorization for Sparse and Imbalanced Count Data

arXiv.org Machine Learning

Non-negative tensor factorization models enable predictive analysis on count data. Among them, Bayesian Poisson-Gamma models are able to derive full posterior distributions of latent factors and are less sensitive to sparse count data. However, current inference methods for these Bayesian models adopt restricted update rules for the posterior parameters. They also fail to share the update information to better cope with the data sparsity. Moreover, these models are not endowed with a component that handles the imbalance in count data values. In this paper, we propose a novel variational auto-encoder framework called VAE-BPTF which addresses the above issues. It uses multi-layer perceptron networks to encode and share complex update information. The encoded information is then reweighted per data instance to penalize common data values before aggregated to compute the posterior parameters for the latent factors. Under synthetic data evaluation, VAE-BPTF tended to recover the right number of latent factors and posterior parameter values. It also outperformed current models in both reconstruction errors and latent factor (semantic) coherence across five real-world datasets. Furthermore, the latent factors inferred by VAE-BPTF are perceived to be meaningful and coherent under a qualitative analysis.


Interventional Experiment Design for Causal Structure Learning

arXiv.org Artificial Intelligence

It is known that from purely observational data, a causal DAG is identifiable only up to its Markov equivalence class, and for many ground truth DAGs, the direction of a large portion of the edges will be remained unidentified. The golden standard for learning the causal DAG beyond Markov equivalence is to perform a sequence of interventions in the system and use the data gathered from the interventional distributions. We consider a setup in which given a budget $k$, we design $k$ interventions non-adaptively. We cast the problem of finding the best intervention target set as an optimization problem which aims to maximize the number of edges whose directions are identified due to the performed interventions. First, we consider the case that the underlying causal structure is a tree. For this case, we propose an efficient exact algorithm for the worst-case gain setup, as well as an approximate algorithm for the average gain setup. We then show that the proposed approach for the average gain setup can be extended to the case of general causal structures. In this case, besides the design of interventions, calculating the objective function is also challenging. We propose an efficient exact calculator as well as two estimators for this task. We evaluate the proposed methods using synthetic as well as real data.


A Gentle Introduction to Bayesian Belief Networks

#artificialintelligence

Probabilistic models can define relationships between variables and be used to calculate probabilities. For example, fully conditional models may require an enormous amount of data to cover all possible cases, and probabilities may be intractable to calculate in practice. Simplifying assumptions such as the conditional independence of all random variables can be effective, such as in the case of Naive Bayes, although it is a drastically simplifying step. An alternative is to develop a model that preserves known conditional dependence between random variables and conditional independence in all other cases. Bayesian networks are a probabilistic graphical model that explicitly capture the known conditional dependence with directed edges in a graph model.