Goto

Collaborating Authors

 Markov Models


Iterative Neural Autoregressive Distribution Estimator NADE-k

Neural Information Processing Systems

Training of the neural autoregressive density estimator (NADE) can be viewed as doing one step of probabilistic inference on missing values in data. We propose a new model that extends this inference scheme to multiple steps, arguing that it is easier to learn to improve a reconstruction in $k$ steps rather than to learn to reconstruct in a single inference step. The proposed model is an unsupervised building block for deep learning that combines the desirable properties of NADE and multi-predictive training: (1) Its test likelihood can be computed analytically, (2) it is easy to generate independent samples from it, and (3) it uses an inference engine that is a superset of variational inference for Boltzmann machines. The proposed NADE-k is competitive with the state-of-the-art in density estimation on the two datasets tested.


Parallel Sampling of HDPs using Sub-Cluster Splits

Neural Information Processing Systems

We develop a sampling technique for Hierarchical Dirichlet process models. The parallel algorithm builds upon [Chang & Fisher 2013] by proposing large split and merge moves based on learned sub-clusters. The additional global split and merge moves drastically improve convergence in the experimental results. Furthermore, we discover that cross-validation techniques do not adequately determine convergence, and that previous sampling methods converge slower than were previously expected.


Multiscale Fields of Patterns

Neural Information Processing Systems

We describe a framework for defining high-order image models that can be used in a variety of applications. The approach involves modeling local patterns in a multiscale representation of an image. Local properties of a coarsened image reflect non-local properties of the original image. In the case of binary images local properties are defined by the binary patterns observed over small neighborhoods around each pixel. With the multiscale representation we capture the frequency of patterns observed at different scales of resolution. This framework leads to expressive priors that depend on a relatively small number of parameters. For inference and learning we use an MCMC method for block sampling with very large blocks. We evaluate the approach with two example applications. One involves contour detection. The other involves binary segmentation.


Restricted Boltzmann machines modeling human choice

Neural Information Processing Systems

We extend the multinomial logit model to represent some of the empirical phenomena that are frequently observed in the choices made by humans. These phenomena include the similarity effect, the attraction effect, and the compromise effect. We formally quantify the strength of these phenomena that can be represented by our choice model, which illuminates the flexibility of our choice model. We then show that our choice model can be represented as a restricted Boltzmann machine and that its parameters can be learned effectively from data. Our numerical experiments with real data of human choices suggest that we can train our choice model in such a way that it represents the typical phenomena of choice.


A Hidden Markov Model-Based Acoustic Cicada Detector for Crowdsourced Smartphone Biodiversity Monitoring

Journal of Artificial Intelligence Research

In recent years, the field of computational sustainability has striven to apply artificial intelligence techniques to solve ecological and environmental problems. In ecology, a key issue for the safeguarding of our planet is the monitoring of biodiversity. Automated acoustic recognition of species aims to provide a cost-effective method for biodiversity monitoring. This is particularly appealing for detecting endangered animals with a distinctive call, such as the New Forest cicada. To this end, we pursue a crowdsourcing approach, whereby the millions of visitors to the New Forest, where this insect was historically found, will help to monitor its presence by means of a smartphone app that can detect its mating call. Existing research in the field of acoustic insect detection has typically focused upon the classification of recordings collected from fixed field microphones. Such approaches segment a lengthy audio recording into individual segments of insect activity, which are independently classified using cepstral coefficients extracted from the recording as features. This paper reports on a contrasting approach, whereby we use crowdsourcing to collect recordings via a smartphone app, and present an immediate feedback to the users as to whether an insect has been found. Our classification approach does not remove silent parts of the recording via segmentation, but instead uses the temporal patterns throughout each recording to classify the insects present. We show that our approach can successfully discriminate between the call of the New Forest cicada and similar insects found in the New Forest, and is robust to common types of environment noise. A large scale trial deployment of our smartphone app collected over 6000 reports of insect activity from over 1000 users. Despite the cicada not having been rediscovered in the New Forest, the effectiveness of this approach was confirmed for both the detection algorithm, which successfully identified the same cicada through the app in countries where the same species is still present, and of the crowdsourcing methodology, which collected a vast number of recordings and involved thousands of contributors.


Detailed Derivations of Small-Variance Asymptotics for some Hierarchical Bayesian Nonparametric Models

arXiv.org Machine Learning

In this note we provide detailed derivations of two versions of small-variance asymptotics for hierarchical Dirichlet process (HDP) mixture models and the HDP hidden Markov model (HDP-HMM, a.k.a. the infinite HMM). We include derivations for the probabilities of certain CRP and CRF partitions, which are of more general interest.


Accurate and Conservative Estimates of MRF Log-likelihood using Reverse Annealing

arXiv.org Machine Learning

Markov random fields (MRFs) are difficult to evaluate as generative models because computing the test log-probabilities requires the intractable partition function. Annealed importance sampling (AIS) is widely used to estimate MRF partition functions, and often yields quite accurate results. However, AIS is prone to overestimate the log-likelihood with little indication that anything is wrong. We present the Reverse AIS Estimator (RAISE), a stochastic lower bound on the log-likelihood of an approximation to the original MRF model. RAISE requires only the same MCMC transition operators as standard AIS. Experimental results indicate that RAISE agrees closely with AIS log-probability estimates for RBMs, DBMs, and DBNs, but typically errs on the side of underestimating, rather than overestimating, the log-likelihood.


Understanding and Designing Complex Systems: Response to "A framework for optimal high-level descriptions in science and engineering---preliminary report"

arXiv.org Artificial Intelligence

Building compact models of nonlinear processes goes to the heart of our understanding the complex world around us--a world replete with unanticipated, emergent patterns. Via discovery mechanisms that we do not yet understand well, we eventually do come to know many of these patterns, even if we have never seen them before. Such discoveries can be substantial. At a minimum, compact models that capture such emergent "macrostates" are essential tools in harnessing complex processes to useful ends. Most ambitiously, one would hope to automate the discovery process itself, providing an especially useful tool for the era of Big Data. One key problem in the larger endeavor of pattern discovery is dimension reduction: reduce the high-dimensional state space of a stochastic dynamical system into smaller, more manageable models that nonetheless still capture the relevant dynamics. The study of complex systems always requires this.


Tutorial on Structured Continuous-Time Markov Processes

Journal of Artificial Intelligence Research

A continuous-time Markov process (CTMP) is a collection of variables indexed by a continuous quantity, time. It obeys the Markov property that the distribution over a future variable is independent of past variables given the state at the present time. We introduce continuous-time Markov process representations and algorithms for filtering, smoothing, expected sufficient statistics calculations, and model estimation, assuming no prior knowledge of continuous-time processes but some basic knowledge of probability and statistics. We begin by describing "flat" or unstructured Markov processes and then move to structured Markov processes (those arising from state spaces consisting of assignments to variables) including Kronecker, decision-diagram, and continuous-time Bayesian network representations. We provide the first connection between decision-diagrams and continuous-time Bayesian networks.


Circumventing the Curse of Dimensionality in Prediction: Causal Rate-Distortion for Infinite-Order Markov Processes

arXiv.org Machine Learning

Predictive rate-distortion analysis suffers from the curse of dimensionality: clustering arbitrarily long pasts to retain information about arbitrarily long futures requires resources that typically grow exponentially with length. The challenge is compounded for infinite-order Markov processes, since conditioning on finite sequences cannot capture all of their past dependencies. Spectral arguments show that algorithms which cluster finite-length sequences fail dramatically when the underlying process has long-range temporal correlations and can fail even for processes generated by finite-memory hidden Markov models. We circumvent the curse of dimensionality in rate-distortion analysis of infinite-order processes by casting predictive rate-distortion objective functions in terms of the forward- and reverse-time causal states of computational mechanics. Examples demonstrate that the resulting causal rate-distortion theory substantially improves current predictive rate-distortion analyses.