Goto

Collaborating Authors

 Bayesian Inference


Pseudo-Marginal Hamiltonian Monte Carlo

arXiv.org Machine Learning

Bayesian inference in the presence of an intractable likelihood function is computationally challenging. When following a Markov chain Monte Carlo (MCMC) approach to approximate the posterior distribution in this context, one typically either uses MCMC schemes which target the joint posterior of the parameters and some auxiliary latent variables or pseudo-marginal Metropolis-Hastings (MH) schemes which mimic a MH algorithm targeting the marginal posterior of the parameters by approximating unbiasedly the intractable likelihood. In scenarios where the parameters and auxiliary variables are strongly correlated under the posterior and/or this posterior is multimodal, Gibbs sampling or Hamiltonian Monte Carlo (HMC) will perform poorly and the pseudo-marginal MH algorithm, as any other MH scheme, will be inefficient for high dimensional parameters. We propose here an original MCMC algorithm, termed pseudo-marginal HMC, which approximates the HMC algorithm targeting the marginal posterior of the parameters. We demonstrate through experiments that pseudo-marginal HMC can outperform significantly both standard HMC and pseudo-marginal MH schemes.


A Classification Framework for Partially Observed Dynamical Systems

arXiv.org Machine Learning

We present a general framework for classifying partially observed dynamical systems based on the idea of learning in the model space. In contrast to the existing approaches using model point estimates to represent individual data items, we employ posterior distributions over models, thus taking into account in a principled manner the uncertainty due to both the generative (observational and/or dynamic noise) and observation (sampling in time) processes. We evaluate the framework on two testbeds - a biological pathway model and a stochastic double-well system. Crucially, we show that the classifier performance is not impaired when the model class used for inferring posterior distributions is much more simple than the observation-generating model class, provided the reduced complexity inferential model class captures the essential characteristics needed for the given classification task.


An optimal learning method for developing personalized treatment regimes

arXiv.org Machine Learning

A treatment regime is a function that maps individual patient information to a recommended treatment, hence explicitly incorporating the heterogeneity in need for treatment across individuals. Patient responses are dichotomous and can be predicted through an unknown relationship that depends on the patient information and the selected treatment. The goal is to find the treatments that lead to the best patient responses on average. Each experiment is expensive, forcing us to learn the most from each experiment. We adopt a Bayesian approach both to incorporate possible prior information and to update our treatment regime continuously as information accrues, with the potential to allow smaller yet more informative trials and for patients to receive better treatment. By formulating the problem as contextual bandits, we introduce a knowledge gradient policy to guide the treatment assignment by maximizing the expected value of information, for which an approximation method is used to overcome computational challenges. We provide a detailed study on how to make sequential medical decisions under uncertainty to reduce health care costs on a real world knee replacement dataset. We use clustering and LASSO to deal with the intrinsic sparsity in health datasets. We show experimentally that even though the problem is sparse, through careful selection of physicians (versus picking them at random), we can significantly improve the success rates.


Bayesian machine learning - FastML

#artificialintelligence

So you know the Bayes rule. How does it relate to machine learning? It can be quite difficult to grasp how the puzzle pieces fit together - we know it took us a while. This article is an introduction we wish we had back then. While we have some grasp on the matter, we're not experts, so the following might contain inaccuracies or even outright errors. Feel free to point them out, either in the comments or privately.


Automatic Variational ABC

arXiv.org Machine Learning

Approximate Bayesian Computation (ABC) is a framework for performing likelihood-free posterior inference for simulation models. Stochastic Variational inference (SVI) is an appealing alternative to the inefficient sampling approaches commonly used in ABC. However, SVI is highly sensitive to the variance of the gradient estimators, and this problem is exacerbated by approximating the likelihood. We draw upon recent advances in variance reduction for SVI [6][13] and likelihood-free inference using deterministic simulations [12] to produce low variance gradient estimators of the variational lower-bound. By then exploiting automatic differentiation libraries [8] we can avoid nearly all model-specific derivations. We demonstrate performance on three problems and compare to existing SVI algorithms. Our results demonstrate the correctness and efficiency of our algorithm.


Bootstrap-Based Regularization for Low-Rank Matrix Estimation

arXiv.org Machine Learning

We develop a flexible framework for low-rank matrix estimation that allows us to transform noise models into regularization schemes via a simple bootstrap algorithm. Effectively, our procedure seeks an autoencoding basis for the observed matrix that is stable with respect to the specified noise model; we call the resulting procedure a stable autoencoder. In the simplest case, with an isotropic noise model, our method is equivalent to a classical singular value shrinkage estimator. For non-isotropic noise models--e.g., Poisson noise-- the method does not reduce to singular value shrinkage, and instead yields new estimators that perform well in experiments. Moreover, by iterating our stable autoencoding scheme, we can automatically generate low-rank estimates without specifying the target rank as a tuning parameter.


Expectation propagation for continuous time stochastic processes

arXiv.org Machine Learning

Physical and technological processes frequently exhibit intrinsic stochasticity. The main mathematical framework to describe and reason about such systems is provided by the theory of continuous time (Markovian) stochastic processes. Such processes have been well studied in chemical physics for several decades as models of chemical reactions at very low concentrations [Gardiner, 1985, e.g.]. More recently, the theory has found novel and diverse areas of application including systems biology at the single cell level [Wilkinson, 2011], ecology [Volkov et al., 2007] and performance modelling in computer systems [Hillston, 2005], to name but a few. The popularity of the approach has been greatly enhanced by the availability of efficient and accurate simulation algorithms [Gillespie, 1977, Gillespie et al., 2013], which permit a numerical solution of medium-sized systems within a reasonable time frame. As with most of science, many of the application domains of continuous time stochastic processes are becoming increasingly data-rich, creating a critical demand for inference algorithms which can use data to calibrate the models and analyse the uncertainty in the predictions. This raises new challenges and opportunities for statistics and machine learning, and has motivated the development of several algorithms for efficient inference in these systems. In this paper, we focus on the Bayesian approach, and formulate the inverse problem in terms of obtaining an approximation to a posterior distribution over the stochastic process, given observations of the system and using existing scientific information to build a prior model of the process.


History of Data Mining

#artificialintelligence

Data mining is everywhere, but its story starts many years before Moneyball and Edward Snowden. The following are major milestones and "firsts" in the history of data mining plus how it's evolved and blended with data science and big data. Data mining is the computational process of exploring and uncovering patterns in large data sets a.k.a. It is fundamental to data mining and probability, since it allows understanding of complex realities based on estimated probabilities. The goal of regression analysis is to estimate the relationships among variables, and the specific method they used in this case is the method of least squares.


Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems): Ian H. Witten, Eibe Frank: 9780120884070: Amazon.com: Books

@machinelearnbot

This book is very easy to read and understand. Unlike Hastie's Statistical Learning book, it is not geared towards those with an expert level knowledge of statistics, and instead takes time to explain functions and formulas for the person with a decent but not extrordinary understanding of statistical/math concepts. For example, their description of a Gaussian was the clearest I've seen. On the other hand, if you're math/statistics background is considerable, you may find this book somewhat simplistic or tedious. The book has a good coverage of techniques and algorithms, although I was somewhat disappointed that they do not mention Influence Diagrams, considering the amount of coverage of both decision trees and Bayesian techniques.


Dynamic Hierarchical Dirichlet Process for Abnormal Behaviour Detection in Video

arXiv.org Machine Learning

This paper proposes a novel dynamic Hierarchical Dirichlet Process topic model that considers the dependence between successive observations. Conventional posterior inference algorithms for this kind of models require processing of the whole data through several passes. It is computationally intractable for massive or sequential data. We design the batch and online inference algorithms, based on the Gibbs sampling, for the proposed model. It allows to process sequential data, incrementally updating the model by a new observation. The model is applied to abnormal behaviour detection in video sequences. A new abnormality measure is proposed for decision making. The proposed method is compared with the method based on the non- dynamic Hierarchical Dirichlet Process, for which we also derive the online Gibbs sampler and the abnormality measure. The results with synthetic and real data show that the consideration of the dynamics in a topic model improves the classification performance for abnormal behaviour detection.