Goto

Collaborating Authors

 Bayesian Inference


Discrete Variational Autoencoders

arXiv.org Machine Learning

Probabilistic models with discrete latent variables naturally capture datasets composed of discrete classes. However, they are difficult to train efficiently, since backpropagation through discrete variables is generally not possible. We present a novel method to train a class of probabilistic models with discrete latent variables using the variational autoencoder framework, including backpropagation through the discrete latent variables. The associated class of probabilistic models comprises an undirected discrete component and a directed hierarchical continuous component. The discrete component captures the distribution over the disconnected smooth manifolds induced by the continuous component. As a result, this class of models efficiently learns both the class of objects in an image, and their specific realization in pixels, from unsupervised data, and outperforms state-of-the-art methods on the permutation-invariant MNIST, Omniglot, and Caltech-101 Silhouettes datasets.


Integrating Additional Knowledge Into Estimation of Graphical Models

arXiv.org Machine Learning

In applications of graphical models, we typically have more information than just the samples themselves. A prime example is the estimation of brain connectivity networks based on fMRI data, where in addition to the samples themselves, the spatial positions of the measurements are readily available. With particular regard for this application, we are thus interested in ways to incorporate additional knowledge most effectively into graph estimation. Our approach to this is to make neighborhood selection receptive to additional knowledge by strengthening the role of the tuning parameters. We demonstrate that this concept (i) can improve reproducibility, (ii) is computationally convenient and efficient, and (iii) carries a lucid Bayesian interpretation. We specifically show that the approach provides effective estimations of brain connectivity graphs from fMRI data. However, providing a general scheme for the inclusion of additional knowledge, our concept is expected to have applications in a wide range of domains.


Distributed Statistical Estimation and Rates of Convergence in Normal Approximation

arXiv.org Machine Learning

This paper presents new algorithms for distributed statistical estimation that can take advantage of the divide-and-conquer approach. We show that one of the key benefits attained by an appropriate divide-and-conquer strategy is robustness, an important characteristic of large distributed systems. We introduce a class of algorithms that are based on the properties of the geometric median, establish connections between performance of these distributed algorithms and rates of convergence in normal approximation, and provide tight deviations guarantees for resulting estimators in the form of exponential concentration inequalities. Our techniques are illustrated through several examples: in particular, we obtain new results for the median-of-means estimator, as well as provide performance guarantees for robust distributed maximum likelihood estimation.


Deterministic Quantum Annealing Expectation-Maximization Algorithm

arXiv.org Machine Learning

Maximum likelihood estimation (MLE) is one of the most important methods in machine learning, and the expectation-maximization (EM) algorithm is often used to obtain maximum likelihood estimates. However, EM heavily depends on initial configurations and fails to find the global optimum. On the other hand, in the field of physics, quantum annealing (QA) was proposed as a novel optimization approach. Motivated by QA, we propose a quantum annealing extension of EM, which we call the deterministic quantum annealing expectation-maximization (DQAEM) algorithm. We also discuss its advantage in terms of the path integral formulation. Furthermore, by employing numerical simulations, we illustrate how it works in MLE and show that DQAEM outperforms EM.


On the choice of the low-dimensional domain for global optimization via random embeddings

arXiv.org Machine Learning

The challenge of taking many variables into account in optimization problems may be overcome under the hypothesis of low effective dimensionality. Then, the search of solutions can be reduced to the random embedding of a low dimensional space into the original one, resulting in a more manageable optimization problem. Specifically, in the case of time consuming black-box functions and when the budget of evaluations is severely limited, global optimization with random embeddings appears as a sound alternative to random search. Yet, in the case of box constraints on the native variables, defining suitable bounds on a low dimensional domain appears to be complex. Indeed, a small search domain does not guarantee to find a solution even under restrictive hypotheses about the function, while a larger one may slow down convergence dramatically. Here we tackle the issue of low-dimensional domain selection based on a detailed study of the properties of the random embedding, giving insight on the aforementioned difficulties. In particular, we describe a minimal low-dimensional set in correspondence with the embedded search space. We additionally show that an alternative equivalent embedding procedure yields simultaneously a simpler definition of the low-dimensional minimal set and better properties in practice. Finally, the performance and robustness gains of the proposed enhancements for Bayesian optimization are illustrated on three examples.


Applying Bayes Theorem: Simulating the Monty Hall Problem with Python

#artificialintelligence

The Monty Hall problem was first featured on the classic game show "Let's make a Deal". In the final segment of the show, contestants were presented with a choice of three different doors. Behind two of the doors would be a goat, and behind the third would be an extravagant prize such as a car. The contestant begins the game by picking one door. The host, Monty Hall, would then open one of the remaining doors.


Bayesian Hybrid Matrix Factorisation for Data Integration

arXiv.org Machine Learning

We introduce a novel Bayesian hybrid matrix factorisation model (HMF) for data integration, based on combining multiple matrix factorisation methods, that can be used for in- and out-of-matrix prediction of missing values. The model is very general and can be used to integrate many datasets across different entity types, including repeated experiments, similarity matrices, and very sparse datasets. We apply our method on two biological applications, and extensively compare it to state-of-the-art machine learning and matrix factorisation models. For in-matrix predictions on drug sensitivity datasets we obtain consistently better performances than existing methods. This is especially the case when we increase the sparsity of the datasets. Furthermore, we perform out-of-matrix predictions on methylation and gene expression datasets, and obtain the best results on two of the three datasets, especially when the predictivity of datasets is high.


Variational Hamiltonian Monte Carlo via Score Matching

arXiv.org Machine Learning

Traditionally, the field of computational Bayesian statistics has been divided into two main subfields: variational methods and Markov chain Monte Carlo (MCMC). In recent years, however, several methods have been proposed based on combining variational Bayesian inference and MCMC simulation in order to improve their overall accuracy and computational efficiency. This marriage of fast evaluation and flexible approximation provides a promising means of designing scalable Bayesian inference methods. In this paper, we explore the possibility of incorporating variational approximation into a state-of-the-art MCMC method, Hamiltonian Monte Carlo (HMC), to reduce the required gradient computation in the simulation of Hamiltonian flow, which is the bottleneck for many applications of HMC in big data problems. To this end, we use a {\it free-form} approximation induced by a fast and flexible surrogate function based on single-hidden layer feedforward neural networks. The surrogate provides sufficiently accurate approximation while allowing for fast exploration of parameter space, resulting in an efficient approximate inference algorithm. We demonstrate the advantages of our method on both synthetic and real data problems.


Hamiltonian Monte Carlo Acceleration Using Surrogate Functions with Random Bases

arXiv.org Machine Learning

For big data analysis, high computational cost for Bayesian methods often limits their applications in practice. In recent years, there have been many attempts to improve computational efficiency of Bayesian inference. Here we propose an efficient and scalable computational technique for a state-of-the-art Markov Chain Monte Carlo (MCMC) methods, namely, Hamiltonian Monte Carlo (HMC). The key idea is to explore and exploit the structure and regularity in parameter space for the underlying probabilistic model to construct an effective approximation of its geometric properties. To this end, we build a surrogate function to approximate the target distribution using properly chosen random bases and an efficient optimization process. The resulting method provides a flexible, scalable, and efficient sampling algorithm, which converges to the correct target distribution. We show that by choosing the basis functions and optimization process differently, our method can be related to other approaches for the construction of surrogate functions such as generalized additive models or Gaussian process models. Experiments based on simulated and real data show that our approach leads to substantially more efficient sampling algorithms compared to existing state-of-the art methods.


Mixture modeling on related samples by $\psi$-stick breaking and kernel perturbation

arXiv.org Machine Learning

There has been great interest recently in applying nonparametric kernel mixtures in a hierarchical manner to model multiple related data samples jointly. In such settings several data features are commonly present: (i) the related samples often share some, if not all, of the mixture components but with differing weights, (ii) only some, not all, of the mixture components vary across the samples, and (iii) often the shared mixture components across samples are not aligned perfectly in terms of their location and spread, but rather display small misalignments either due to systematic cross-sample difference or more often due to uncontrolled, extraneous causes. Properly incorporating these features in mixture modeling will enhance the efficiency of inference, whereas ignoring them not only reduces efficiency but can jeopardize the validity of the inference due to issues such as confounding. We introduce two techniques for incorporating these features in modeling related data samples using kernel mixtures. The first technique, called $\psi$-stick breaking, is a joint generative process for the mixing weights through the breaking of both a stick shared by all the samples for the components that do not vary in size across samples and an idiosyncratic stick for each sample for those components that do vary in size. The second technique is to imbue random perturbation into the kernels, thereby accounting for cross-sample misalignment. These techniques can be used either separately or together in both parametric and nonparametric kernel mixtures. We derive efficient Bayesian inference recipes based on MCMC sampling for models featuring these techniques, and illustrate their work through both simulated data and a real flow cytometry data set in prediction/estimation, cross-sample calibration, and testing multi-sample differences.