Goto

Collaborating Authors

 Bayesian Inference


Particle filter-based Gaussian process optimisation for parameter inference

arXiv.org Machine Learning

We propose a novel method for maximum likelihood-based parameter inference in nonlinear and/or non-Gaussian state space models. The method is an iterative procedure with three steps. At each iteration a particle filter is used to estimate the value of the log-likelihood function at the current parameter iterate. Using these log-likelihood estimates, a surrogate objective function is created by utilizing a Gaussian process model. Finally, we use a heuristic procedure to obtain a revised parameter iterate, providing an automatic trade-off between exploration and exploitation of the surrogate model. The method is profiled on two state space models with good performance both considering accuracy and computational cost.


Venture: a higher-order probabilistic programming platform with programmable inference

arXiv.org Machine Learning

We describe Venture, an interactive virtual machine for probabilistic programming that aims to be sufficiently expressive, extensible, and efficient for general-purpose use. Like Church, probabilistic models and inference problems in Venture are specified via a Turing-complete, higher-order probabilistic language descended from Lisp. Unlike Church, Venture also provides a compositional language for custom inference strategies built out of scalable exact and approximate techniques. We also describe four key aspects of Venture's implementation that build on ideas from probabilistic graphical models. First, we describe the stochastic procedure interface (SPI) that specifies and encapsulates primitive random variables. The SPI supports custom control flow, higher-order probabilistic procedures, partially exchangeable sequences and ``likelihood-free'' stochastic simulators. It also supports external models that do inference over latent variables hidden from Venture. Second, we describe probabilistic execution traces (PETs), which represent execution histories of Venture programs. PETs capture conditional dependencies, existential dependencies and exchangeable coupling. Third, we describe partitions of execution histories called scaffolds that factor global inference problems into coherent sub-problems. Finally, we describe a family of stochastic regeneration algorithms for efficiently modifying PET fragments contained within scaffolds. Stochastic regeneration linear runtime scaling in cases where many previous approaches scaled quadratically. We show how to use stochastic regeneration and the SPI to implement general-purpose inference strategies such as Metropolis-Hastings, Gibbs sampling, and blocked proposals based on particle Markov chain Monte Carlo and mean-field variational inference techniques.


Accelerating MCMC via Parallel Predictive Prefetching

arXiv.org Machine Learning

We present a general framework for accelerating a large class of widely used Markov chain Monte Carlo (MCMC) algorithms. Our approach exploits fast, iterative approximations to the target density to speculatively evaluate many potential future steps of the chain in parallel. The approach can accelerate computation of the target distribution of a Bayesian inference problem, without compromising exactness, by exploiting subsets of data. It takes advantage of whatever parallel resources are available, but produces results exactly equivalent to standard serial execution. In the initial burn-in phase of chain evaluation, it achieves speedup over serial evaluation that is close to linear in the number of available cores.


Hierarchical Block Structures and High-resolution Model Selection in Large Networks

arXiv.org Machine Learning

Discovering and characterizing the large-scale topological features in empirical networks are crucial steps in understanding how complex systems function. However, most existing methods used to obtain the modular structure of networks suffer from serious problems, such as being oblivious to the statistical evidence supporting the discovered patterns, which results in the inability to separate actual structure from noise. In addition to this, one also observes a resolution limit on the size of communities, where smaller but well-defined clusters are not detectable when the network becomes large. This phenomenon occurs not only for the very popular approach of modularity optimization, which lacks built-in statistical validation, but also for more principled methods based on statistical inference and model selection, which do incorporate statistical validation in a formally correct way. Here we construct a nested generative model that, through a complete description of the entire network hierarchy at multiple scales, is capable of avoiding this limitation, and enables the detection of modular structure at levels far beyond those possible with current approaches. Even with this increased resolution, the method is based on the principle of parsimony, and is capable of separating signal from noise, and thus will not lead to the identification of spurious modules even on sparse networks. Furthermore, it fully generalizes other approaches in that it is not restricted to purely assortative mixing patterns, directed or undirected graphs, and ad hoc hierarchical structures such as binary trees. Despite its general character, the approach is tractable, and can be combined with advanced techniques of community detection to yield an efficient algorithm that scales well for very large networks.


Firefly Monte Carlo: Exact MCMC with Subsets of Data

arXiv.org Machine Learning

Markov chain Monte Carlo (MCMC) is a popular and successful general-purpose tool for Bayesian inference. However, MCMC cannot be practically applied to large data sets because of the prohibitive cost of evaluating every likelihood term at every iteration. Here we present Firefly Monte Carlo (FlyMC) an auxiliary variable MCMC algorithm that only queries the likelihoods of a potentially small subset of the data at each iteration yet simulates from the exact posterior distribution, in contrast to recent proposals that are approximate even in the asymptotic limit. FlyMC is compatible with a wide variety of modern MCMC algorithms, and only requires a lower bound on the per-datum likelihood factors. In experiments, we find that FlyMC generates samples from the posterior more than an order of magnitude faster than regular MCMC, opening up MCMC methods to larger datasets than were previously considered feasible.


Asymptotically Exact, Embarrassingly Parallel MCMC

arXiv.org Machine Learning

Communication costs, resulting from synchronization requirements during learning, can greatly slow down many parallel machine learning algorithms. In this paper, we present a parallel Markov chain Monte Carlo (MCMC) algorithm in which subsets of data are processed independently, with very little communication. First, we arbitrarily partition data onto multiple machines. Then, on each machine, any classical MCMC method (e.g., Gibbs sampling) may be used to draw samples from a posterior distribution given the data subset. Finally, the samples from each machine are combined to form samples from the full posterior. This embarrassingly parallel algorithm allows each machine to act independently on a subset of the data (without communication) until the final combination stage. We prove that our algorithm generates asymptotically exact samples and empirically demonstrate its ability to parallelize burn-in and sampling in several models.


Constraint-based Causal Discovery from Multiple Interventions over Overlapping Variable Sets

arXiv.org Machine Learning

Scientific practice typically involves repeatedly studying a system, each time trying to unravel a different perspective. In each study, the scientist may take measurements under different experimental conditions (interventions, manipulations, perturbations) and measure different sets of quantities (variables). The result is a collection of heterogeneous data sets coming from different data distributions. In this work, we present algorithm COmbINE, which accepts a collection of data sets over overlapping variable sets under different experimental conditions; COmbINE then outputs a summary of all causal models indicating the invariant and variant structural characteristics of all models that simultaneously fit all of the input data sets. COmbINE converts estimated dependencies and independencies in the data into path constraints on the data-generating causal model and encodes them as a SAT instance. The algorithm is sound and complete in the sample limit. To account for conflicting constraints arising from statistical errors, we introduce a general method for sorting constraints in order of confidence, computed as a function of their corresponding p-values. In our empirical evaluation, COmbINE outperforms in terms of efficiency the only pre-existing similar algorithm; the latter additionally admits feedback cycles, but does not admit conflicting constraints which hinders the applicability on real data. As a proof-of-concept, COmbINE is employed to co-analyze 4 real, mass-cytometry data sets measuring phosphorylated protein concentrations of overlapping protein sets under 3 different interventions.


Robust Asymmetric Clustering

arXiv.org Machine Learning

Contaminated mixture models are developed for model-based clustering of data with asymmetric clusters as well as spurious points, outliers, and/or noise. Specifically, we introduce a contaminated mixture of contaminated shifted asymmetric Laplace distributions and a contaminated mixture of contaminated skew-normal distributions. In each case, mixture components have a parameter controlling the proportion of bad points (i.e., spurious points, outliers, and/or noise) and one specifying the degree of contamination. A very important feature of our approaches is that these parameters do not have to be specified a priori. Expectation-conditional maximization algorithms are outlined for parameter estimation and the number of components is selected using the Bayesian information criterion. The performance of our approaches is illustrated on artificial and real data.


Bayesian Sample Size Determination of Vibration Signals in Machine Learning Approach to Fault Diagnosis of Roller Bearings

arXiv.org Machine Learning

Sample size determination for a data set is an important statistical process for analyzing the data to an optimum level of accuracy and using minimum computational work. The applications of this process are credible in every domain which deals with large data sets and high computational work. This study uses Bayesian analysis for determination of minimum sample size of vibration signals to be considered for fault diagnosis of a bearing using pre-defined parameters such as the inverse standard probability and the acceptable margin of error. Thus an analytical formula for sample size determination is introduced. The fault diagnosis of the bearing is done using a machine learning approach using an entropy-based J48 algorithm. The following method will help researchers involved in fault diagnosis to determine minimum sample size of data for analysis for a good statistical stability and precision.


Bayesian Inference for NMR Spectroscopy with Applications to Chemical Quantification

arXiv.org Machine Learning

Nuclear magnetic resonance (NMR) spectroscopy exploits the magnetic properties of atomic nuclei to discover the structure, reaction state and chemical environment of molecules. We propose a probabilistic generative model and inference procedures for NMR spectroscopy. Specifically, we use a weighted sum of trigonometric functions undergoing exponential decay to model free induction decay (FID) signals. We discuss the challenges in estimating the components of this general model -- amplitudes, phase shifts, frequencies, decay rates, and noise variances -- and offer practical solutions. We compare with conventional Fourier transform spectroscopy for estimating the relative concentrations of chemicals in a mixture, using synthetic and experimentally acquired FID signals. We find the proposed model is particularly robust to low signal to noise ratios (SNR), and overlapping peaks in the Fourier transform of the FID, enabling accurate predictions (e.g., 1% sensitivity at low SNR) which are not possible with conventional spectroscopy (5% sensitivity).