Goto

Collaborating Authors

 Undirected Networks


Markov Chain Monte Carlo - Nice R Code

#artificialintelligence

This topic doesn't have much to do with nicer code, but there is probably some overlap in interest. However, some of the topics that we cover arise naturally here, so read on! MCMC is simply an algorithm for sampling from a distribution. The term stands for "Markov Chain Monte Carlo", because it is a type of "Monte Carlo" (i.e., a random) method that uses "Markov chains" (we'll discuss these later). MCMC is just one type of Monte Carlo method, although it is possible to view many other commonly used methods as simply special cases of MCMC.


Uniform {\varepsilon}-Stability of Distributed Nonlinear Filtering over DNAs: Gaussian-Finite HMMs

arXiv.org Machine Learning

In this work, we study stability of distributed filtering of Markov chains with finite state space, partially observed in conditionally Gaussian noise. We consider a nonlinear filtering scheme over a Distributed Network of Agents (DNA), which relies on the distributed evaluation of the likelihood part of the centralized nonlinear filter and is based on a particular specialization of the Alternating Direction Method of Multipliers (ADMM) for fast average consensus. Assuming the same number of consensus steps between any two consecutive noisy measurements for each sensor in the network, we fully characterize a minimal number of such steps, such that the distributed filter remains uniformly stable with a prescribed accuracy level, {\varepsilon} \in (0,1], within a finite operational horizon, T, and across all sensors. Stability is in the sense of the \ell_1-norm between the centralized and distributed versions of the posterior at each sensor, and at each time within T. Roughly speaking, our main result shows that uniform {\varepsilon}-stability of the distributed filtering process depends only loglinearly on T and (roughly) the size of the network, and only logarithmically on 1/{\varepsilon}. If this total loglinear bound is fulfilled, any additional consensus iterations will incur a fully quantified further exponential decay in the consensus error. Our bounds are universal, in the sense that they are independent of the particular structure of the Gaussian Hidden Markov Model (HMM) under consideration.


Handbook of Markov Chain Monte Carlo

@machinelearnbot

Parallel Bayesian MCMC Imputation for Multiple Distributed Lag Models: A Case Study in Environmental Epidemiology by Brian Caffo, Roger Peng, Francesca Dominici, Thomas Louis and Scott Zeger.


Fast Learning of Clusters and Topics via Sparse Posteriors

arXiv.org Machine Learning

Mixture models and topic models generate each observation from a single cluster, but standard variational posteriors for each observation assign positive probability to all possible clusters. This requires dense storage and runtime costs that scale with the total number of clusters, even though typically only a few clusters have significant posterior mass for any data point. We propose a constrained family of sparse variational distributions that allow at most $L$ non-zero entries, where the tunable threshold $L$ trades off speed for accuracy. Previous sparse approximations have used hard assignments ($L=1$), but we find that moderate values of $L>1$ provide superior performance. Our approach easily integrates with stochastic or incremental optimization algorithms to scale to millions of examples. Experiments training mixture models of image patches and topic models for news articles show that our approach produces better-quality models in far less time than baseline methods.


Markov Chain Monte Carlo Without all the Bullshit

#artificialintelligence

I have a little secret: I don't like the terminology, notation, and style of writing in statistics. I find it unnecessarily complicated. This shows up when trying to read about Markov Chain Monte Carlo methods. Take, for example, the abstract to the Markov Chain Monte Carlo article in the Encyclopedia of Biostatistics. Markov chain Monte Carlo (MCMC) is a technique for estimating by simulation the expectation of a statistic in a complex model. Successive random selections form a Markov chain, the stationary distribution of which is the target distribution.


Regularized Dynamic Boltzmann Machine with Delay Pruning for Unsupervised Learning of Temporal Sequences

arXiv.org Machine Learning

We introduce Delay Pruning, a simple yet powerful technique to regularize dynamic Boltzmann machines (DyBM). The recently introduced DyBM provides a particularly structured Boltzmann machine, as a generative model of a multi-dimensional time-series. This Boltzmann machine can have infinitely many layers of units but allows exact inference and learning based on its biologically motivated structure. DyBM uses the idea of conduction delays in the form of fixed length first-in first-out (FIFO) queues, with a neuron connected to another via this FIFO queue, and spikes from a pre-synaptic neuron travel along the queue to the post-synaptic neuron with a constant period of delay. Here, we present Delay Pruning as a mechanism to prune the lengths of the FIFO queues (making them zero) by setting some delay lengths to one with a fixed probability, and finally selecting the best performing model with fixed delays. The uniqueness of structure and a non-sampling based learning rule in DyBM, make the application of previously proposed regularization techniques like Dropout or DropConnect difficult, leading to poor generalization. First, we evaluate the performance of Delay Pruning to let DyBM learn a multidimensional temporal sequence generated by a Markov chain. Finally, we show the effectiveness of delay pruning in learning high dimensional sequences using the moving MNIST dataset, and compare it with Dropout and DropConnect methods.


Bibliographic Analysis on Research Publications using Authors, Categorical Labels and the Citation Network

arXiv.org Machine Learning

Bibliographic analysis considers the author's research areas, the citation network and the paper content among other things. In this paper, we combine these three in a topic model that produces a bibliographic model of authors, topics and documents, using a nonparametric extension of a combination of the Poisson mixed-topic link model and the author-topic model. This gives rise to the Citation Network Topic Model (CNTM). We propose a novel and efficient inference algorithm for the CNTM to explore subsets of research publications from CiteSeerX. The publication datasets are organised into three corpora, totalling to about 168k publications with about 62k authors. The queried datasets are made available online. In three publicly available corpora in addition to the queried datasets, our proposed model demonstrates an improved performance in both model fitting and document clustering, compared to several baselines. Moreover, our model allows extraction of additional useful knowledge from the corpora, such as the visualisation of the author-topics network. Additionally, we propose a simple method to incorporate supervision into topic modelling to achieve further improvement on the clustering task.


Gaussian Process Pseudo-Likelihood Models for Sequence Labeling

arXiv.org Machine Learning

Several machine learning problems arising in natural language processing can be modeled as a sequence labeling problem. Gaussian processes (GPs) provide a Bayesian approach to learning such problems in a kernel based framework. We develop Gaussian process models based on pseudo-likelihood to solve sequence labeling problems. The pseudo-likelihood model enables one to capture multiple dependencies among the output components of the sequence without becoming computationally intractable. We use an efficient variational Gaussian approximation method to perform inference in the proposed model. We also provide an iterative algorithm which can effectively make use of the information from the neighboring labels to perform prediction. The ability to capture multiple dependencies makes the proposed approach useful for a wide range of sequence labeling problems. Numerical experiments on some sequence labeling problems in natural language processing demonstrate the usefulness of the proposed approach.


Learning HMMs with Nonparametric Emissions via Spectral Decompositions of Continuous Matrices

arXiv.org Machine Learning

Recently, there has been a surge of interest in using spectral methods for estimating latent variable models. However, it is usually assumed that the distribution of the observations conditioned on the latent variables is either discrete or belongs to a parametric family. In this paper, we study the estimation of an $m$-state hidden Markov model (HMM) with only smoothness assumptions, such as H\"olderian conditions, on the emission densities. By leveraging some recent advances in continuous linear algebra and numerical analysis, we develop a computationally efficient spectral algorithm for learning nonparametric HMMs. Our technique is based on computing an SVD on nonparametric estimates of density functions by viewing them as \emph{continuous matrices}. We derive sample complexity bounds via concentration results for nonparametric density estimation and novel perturbation theory results for continuous matrices. We implement our method using Chebyshev polynomial approximations. Our method is competitive with other baselines on synthetic and real problems and is also very computationally efficient.


On the Geometric Ergodicity of Hamiltonian Monte Carlo

arXiv.org Machine Learning

We establish general conditions under which Markov chains produced by the Hamiltonian Monte Carlo method will and will not be geometrically ergodic. We consider implementations with both position-independent and position-dependent integration times. In the former case we find that the conditions for geometric ergodicity are essentially a non-vanishing gradient of the log-density which asymptotically points towards the centre of the space and does not grow faster than linearly. In an idealised scenario in which the integration time is allowed to change in different regions of the space, we show that geometric ergodicity can be recovered for a much broader class of tail behaviours, leading to some guidelines for the choice of this free parameter in practice.