Goto

Collaborating Authors

 Bayesian Learning


Job opportunities (The University of Manchester)

@machinelearnbot

This is an exciting opportunity for a researcher at post-doctoral level with experience of machine learning and data mining. You will work with senior data scientists based within the local NHS trusts, the University of Manchester Health eResearch Centre, and Health Innovation Manchester to automate data extraction of predetermined features for all patients diagnosed with ovarian and colorectal cancer in the conurbation. Machine learning tools including neural networks, support vector machines and naรฏve Bayes algorithms will be refined and tested using the datasets accrued and optimised for clinical practice. Accuracy of prediction will be assessed using predefined criteria. Knowledge of cancer treatment would be useful but is not essential, as the team has extensive expertise in this area.


Provable Algorithms for Inference in Topic Models

arXiv.org Machine Learning

Recently, there has been considerable progress on designing algorithms with provable guarantees -- typically using linear algebraic methods -- for parameter learning in latent variable models. But designing provable algorithms for inference has proven to be more challenging. Here we take a first step towards provable inference in topic models. We leverage a property of topic models that enables us to construct simple linear estimators for the unknown topic proportions that have small variance, and consequently can work with short documents. Our estimators also correspond to finding an estimate around which the posterior is well-concentrated. We show lower bounds that for shorter documents it can be information theoretically impossible to find the hidden topics. Finally, we give empirical results that demonstrate that our algorithm works on realistic topic models. It yields good solutions on synthetic data and runs in time comparable to a {\em single} iteration of Gibbs sampling.


Combinatorial Topic Models using Small-Variance Asymptotics

arXiv.org Machine Learning

Topic models have emerged as fundamental tools in unsupervised machine learning. Most modern topic modeling algorithms take a probabilistic view and derive inference algorithms based on Latent Dirichlet Allocation (LDA) or its variants. In contrast, we study topic modeling as a combinatorial optimization problem, and propose a new objective function derived from LDA by passing to the small-variance limit. We minimize the derived objective by using ideas from combinatorial optimization, which results in a new, fast, and high-quality topic modeling algorithm. In particular, we show that our results are competitive with popular LDA-based topic modeling approaches, and also discuss the (dis)similarities between our approach and its probabilistic counterparts.


Bayes classifier and Naive Bayes tutorial (using the MNIST dataset) - Lazy Programmer

#artificialintelligence

The Naive Bayes classifier is a simple classifier that is often used as a baseline for comparison with more complex classifiers. We will use the famous MNIST data set (pre-processed via PCA and normalized [TODO]) for this tutorial, so our class labels are {0, 1, โ€ฆ, 9}. If you're like me, you may have found this notation a little confusing at first. We can read the left side P(C X) as "the probability that the class is C given the data X". We can read the right side P(X C) as "the probability that the data X belongs to the class C". (this is called the "likelihood") And we can compute the probability that the class 0 given the data, probability that the class 1 given the data, etc. just by computing the probability of the data for each class (how well the data fits a model of each class).


Exact Exponent in Optimal Rates for Crowdsourcing

arXiv.org Machine Learning

In many machine learning applications, crowdsourcing has become the primary means for label collection. In this paper, we study the optimal error rate for aggregating labels provided by a set of non-expert workers. Under the classic Dawid-Skene model, we establish matching upper and lower bounds with an exact exponent $mI(\pi)$ in which $m$ is the number of workers and $I(\pi)$ the average Chernoff information that characterizes the workers' collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement $m>\frac{1}{I(\pi)}\log\frac{1}{\epsilon}$ in order to achieve an $\epsilon$ misclassification error. In addition, our results imply the optimality of various EM algorithms for crowdsourcing initialized by consistent estimators.


Partition Functions from Rao-Blackwellized Tempered Sampling

arXiv.org Machine Learning

Partition functions of probability distributions are important quantities for model evaluation and comparisons. We present a new method to compute partition functions of complex and multimodal distributions. Such distributions are often sampled using simulated tempering, which augments the target space with an auxiliary inverse temperature variable. Our method exploits the multinomial probability law of the inverse temperatures, and provides estimates of the partition function in terms of a simple quotient of Rao-Blackwellized marginal inverse temperature probability estimates, which are updated while sampling. We show that the method has interesting connections with several alternative popular methods, and offers some significant advantages. In particular, we empirically find that the new method provides more accurate estimates than Annealed Importance Sampling when calculating partition functions of large Restricted Boltzmann Machines (RBM); moreover, the method is sufficiently accurate to track training and validation log-likelihoods during learning of RBMs, at minimal computational cost.


Dropout as a Bayesian Approximation: Appendix

arXiv.org Machine Learning

Zoubin Ghahramani We show that a neural network with arbitrary depth and non-linearities, with dropout applied before every weight layer, is mathematically equivalent to an approximation to a well known Bayesian model. This interpretation might offer an explanation to some of dropout's key properties, such as its robustness to overfitting. Our interpretation allows us to reason about uncertainty in deep learning, and allows the introduction of the Bayesian machinery into existing deep learning frameworks in a principled way. This document is an appendix for the main paper "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning" by Gal and Ghahramani, 2015 (http://arxiv.org/abs/1506.02142).


Bayesian Model Selection of Stochastic Block Models

arXiv.org Machine Learning

Abstract--A central problem in analyzing networks is partitioning them into modules or communities. One of the best tools for this is the stochastic block model, which clusters vertices into blocks with statistically homogeneous pattern of links. Despite its flexibility and popularity, there has been a lack of principled statistical model selection criteria for the stochastic block model. Here we propose a Bayesian framework for choosing the number of blocks as well as comparing it to the more elaborate degree-corrected block models, ultimately leading to a universal model selection framework capable of comparing multiple modeling combinations. We will also investigate its connection to the minimum description length principle. I NTRODUCTION An important task in network analysis is community detection, or finding groups of similar vertices which can then be analyzed separately [1]. Community structures offer clues to the processes which generated the graph, on scales ranging from face-to-face social interaction [2] through social-media communications [3] to the organization of food webs [4]. However, previous work often defines a "community" as a group of vertices with high density of connections within the group and a low density of connections to the rest of the network. While this type of assortative community structure is generally the case in social networks, we are interested in a more general definition of functional community--a group of vertices that connect to the rest of the network in similar ways. A set of similar predators form a functional group in a food web, not because they eat each other, but because they feed on similar prey.


Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models

arXiv.org Machine Learning

The future predictive performance of a Bayesian model can be estimated using Bayesian cross-validation. In this article, we consider Gaussian latent variable models where the integration over the latent values is approximated using the Laplace method or expectation propagation (EP). We study the properties of several Bayesian leave-one-out (LOO) cross-validation approximations that in most cases can be computed with a small additional cost after forming the posterior approximation given the full data. Our main objective is to assess the accuracy of the approximative LOO cross-validation estimators. That is, for each method (Laplace and EP) we compare the approximate fast computation with the exact brute force LOO computation. Secondarily, we evaluate the accuracy of the Laplace and EP approximations themselves against a ground truth established through extensive Markov chain Monte Carlo simulation. Our empirical results show that the approach based upon a Gaussian approximation to the LOO marginal distribution (the so-called cavity distribution) gives the most accurate and reliable results among the fast methods.