Goto

Collaborating Authors

 Bayesian Inference


Model-Based Multiple Instance Learning

arXiv.org Machine Learning

While Multiple Instance (MI) data are point patterns -- sets or multi-sets of unordered points -- appropriate statistical point pattern models have not been used in MI learning. This article proposes a framework for model-based MI learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed.


Loan Prediction โ€“ Using PCA and Naive Bayes Classification with R

@machinelearnbot

So, it is very important to predict the loan type and loan amount based on the banks' data. In this blog post, we will discuss about how Naive Bayes Classification model using R can be used to predict the loans. As there are more than two independent variables in customer data, it is difficult to plot chart as two dimensions are needed to better visualize how Machine Learning models work. In this blog post, Naive Bayes Classification Model with R is used.


Automatic Selection of t-SNE Perplexity

arXiv.org Machine Learning

In practice, proper tuning of t-SNE perplexity requires users to understand the inner working of the method as well as to have hands-on experience. We propose a model selection objective for t-SNE perplexity that requires negligible extra computation beyond that of the t-SNE itself. We empirically validate that the perplexity settings found by our approach are consistent with preferences elicited from human experts across a number of datasets. The similarities of our approach to Bayesian information criteria (BIC) and minimum description length (MDL) are also analyzed.


A probabilistic model for the numerical solution of initial value problems

arXiv.org Machine Learning

In recent years, the search for numerical algorithms which return probability distributions over the solution for a given numerical problem has become an active area of research [25]. Several models and methods have been proposed for the solution of initial value problems (IVPs) [57, 7, 51, 9, 31, 61]. However, these probabilistic algorithms have no immediate connection to the extensive literature on this task in numerical analysis. Most importantly, such inference algorithms do not come with convergence analysis out of the box. The methods in [7, 9, 61] have convergence results, but their respective implementations are based on sampling schemes and, thus, do not offer guarantees for individual runs. The methods in [51, 31] offer a deterministic execution and an analytical guarantee for the first step, but we will show that this guarantee is lacking for the whole integration domain. In this paper, we present a class of probabilistic solvers which combine properties of the standard and the probabilistic algorithms. We formulate desiderata that users might have for a probabilistic numerical algorithm.


Communication-Free Parallel Supervised Topic Models

arXiv.org Machine Learning

Embarrassingly (communication-free) parallel Markov chain Monte Carlo (MCMC) methods are commonly used in learning graphical models. However, MCMC cannot be directly applied in learning topic models because of the quasi-ergodicity problem caused by multimodal distribution of topics. In this paper, we develop an embarrassingly parallel MCMC algorithm for sLDA. Our algorithm works by switching the order of sampled topics combination and labeling variable prediction in sLDA, in which it overcomes the quasi-ergodicity problem because high-dimension topics that follow a multimodal distribution are projected into one-dimension document labels that follow a unimodal distribution. Our empirical experiments confirm that the out-of-sample prediction performance using our embarrassingly parallel algorithm is comparable to non-parallel sLDA while the computation time is significantly reduced.


The Multivariate Generalised von Mises distribution: Inference and applications

arXiv.org Machine Learning

Circular variables arise in a multitude of data-modelling contexts ranging from robotics to the social sciences, but they have been largely overlooked by the machine learning community. This paper partially redresses this imbalance by extending some standard probabilistic modelling tools to the circular domain. First we introduce a new multivariate distribution over circular variables, called the multivariate Generalised von Mises (mGvM) distribution. This distribution can be constructed by restricting and renormalising a general multivariate Gaussian distribution to the unit hyper-torus. Previously proposed multivariate circular distributions are shown to be special cases of this construction. Second, we introduce a new probabilistic model for circular regression, that is inspired by Gaussian Processes, and a method for probabilistic principal component analysis with circular hidden variables. These models can leverage standard modelling tools (e.g. covariance functions and methods for automatic relevance determination). Third, we show that the posterior distribution in these models is a mGvM distribution which enables development of an efficient variational free-energy scheme for performing approximate inference and approximate maximum-likelihood learning.


Variational Bayesian inference for linear and logistic regression

arXiv.org Machine Learning

The article describe the model, derivation, and implementation of variational Bayesian inference for linear and logistic regression, both with and without automatic relevance determination. It has the dual function of acting as a tutorial for the derivation of variational Bayesian inference for simple models, as well as documenting, and providing brief examples for the MATLABfunctions that implement this inference. These functions are freely available online. 1. Introduction Linear and logistic regression are essential workhorses of statistical analysis, whose Bayesian treatment has received much recent attention (Gelman et al., 2013; Bishop, 2006; Murphy, 2012; Hastie et al., 2011). These allow specifying the a-priori uncertainty and infer a-posteriori uncertainty about regression coefficients explic-ity and hierarchically, by, for example, specifying how uncertain we are a-priori that these coefficients are small. However, Bayesian inference in such hierarchical models quickly becomes intractable, such that recent effort has focused on approximate inference, like Markov Chain Monte Carlo methods (Gilks et al., 1995), or variational Bayesian approximation (Beal, 2003; Bishop, 2006; Murphy, 2012). Here, we describe such a variational treatment and implementation of Bayesian hierarchical models for both linear and logistic regression. Even though neither the statistical models nor their Bayesian approximation are particularly novel, the article provides a tutorial-style introduction to the derivation of their algorithms, together with a MATLABimplementation of these algorithms.


Delayed acceptance ABC-SMC

arXiv.org Machine Learning

Approximate Bayesian computation (ABC) is now an established technique for statistical inference used in cases where the likelihood function is computationally expensive or not available. It relies on the use of a model that is specified in the form of a simulator, and approximates the likelihood at a parameter $\theta$ by simulating auxiliary data sets $x$ and evaluating the distance of $x$ from the true data $y$. However, ABC is not computationally feasible in cases where using the simulator for each $\theta$ is very expensive. This paper investigates this situation in cases where a cheap, but approximate, simulator is available. The approach is to employ delayed acceptance Markov chain Monte Carlo (MCMC) within an ABC sequential Monte Carlo (SMC) sampler in order to, in a first stage of the kernel, use the cheap simulator to rule out parts of the parameter space that are not worth exploring, so that the "true" simulator is only run (in the second stage of the kernel) where there is a reasonable chance of accepting proposed values of $\theta$. We show that this approach can be used quite automatically, with the only tuning parameter choice additional to ABC-SMC being the number of particles we wish to carry through to the second stage of the kernel. Applications to stochastic differential equation models and latent doubly intractable distributions are presented.


A network approach to topic models

arXiv.org Machine Learning

One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts. Topic models are one popular machine-learning approach which infers the latent topical structure of a collection of documents. Despite their success --- in particular of its most widely used variant called Latent Dirichlet Allocation (LDA) --- and numerous applications in sociology, history, and linguistics, topic models are known to suffer from severe conceptual and practical problems, e.g. a lack of justification for the Bayesian priors, discrepancies with statistical properties of real texts, and the inability to properly choose the number of topics. Here, we approach the problem of identifying topical structures by representing text corpora as bipartite networks of documents and words and using methods from community detection in complex networks, in particular stochastic block models (SBM). We show that our SBM-based approach constitutes a more principled and versatile framework for topic modeling solving the intrinsic limitations of Dirichlet-based models through a more general choice of nonparametric priors. It automatically detects the number of topics and hierarchically clusters both the words and documents. In practice, we demonstrate through the analysis of artificial and real corpora that our approach outperforms LDA in terms of statistical model selection.


Learning Model Reparametrizations: Implicit Variational Inference by Fitting MCMC distributions

arXiv.org Machine Learning

Consider a probabilistic model with joint distribution p(x, z) where x are data and z are latent variables and/or random parameters. Suppose that exact inference in p(x, z) is intractable which means that the posterior distribution p(z x) p(x, z) p(x, z)dz, is difficult to compute due to the normalizing constant p(x) p(x, z)dz that represents the probability of the data and it is known as evidence or marginal likelihood. The marginal likelihood is essential for estimation of any extra parameters in p(x) or for model comparison. Approximate inference algorithms target to approximate p(z x) and/or p(x). Two general frameworks, that we briefly review next, are based on Markov chain Monte Carlo (MCMC) [33, 2] and variational inference (VI) [17, 40].