Undirected Networks
Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
Allahverdyan, Armen, Galstyan, Aram
We present an asymptotic analysis of Viterbi Training (VT) and contrast it with a more conventional Maximum Likelihood (ML) approach to parameter estimation in Hidden Markov Models. While ML estimator works by (locally) maximizing the likelihood of the observed data, VT seeks to maximize the probability of the most likely hidden state sequence. We develop an analytical framework based on a generating function formalism and illustrate it on an exactly solvable model of HMM with one unambiguous symbol. For this particular model the ML objective function is continuously degenerate. VT objective, in contrast, is shown to have only finite degeneracy. Furthermore, VT converges faster and results in sparser (simpler) models, thus realizing an automatic Occam's razor for HMM learning. For more general scenario VT can be worse compared to ML but still capable of correctly recovering most of the parameters.
Accelerated Adaptive Markov Chain for Partition Function Computation
Ermon, Stefano, Gomes, Carla P., Sabharwal, Ashish, Selman, Bart
We propose a novel Adaptive Markov Chain Monte Carlo algorithm to compute the partition function. In particular, we show how to accelerate a flat histogram sampling technique by significantly reducing the number of ``null moves'' in the chain, while maintaining asymptotic convergence properties. Our experiments show that our method converges quickly to highly accurate solutions on a range of benchmark instances, outperforming other state-of-the-art methods such as IJGP, TRW, and Gibbs sampling both in run-time and accuracy. We also show how obtaining a so-called density of states distribution allows for efficient weight learning in Markov Logic theories.
Inference in continuous-time change-point models
Stimberg, Florian, Opper, Manfred, Sanguinetti, Guido, Ruttor, Andreas
We consider the problem of Bayesian inference for continuous time multi-stable stochastic systems which can change both their diffusion and drift parameters at discrete times. We propose exact inference and sampling methodologies for two specific cases where the discontinuous dynamics is given by a Poisson process and a two-state Markovian switch. We test the methodology on simulated data, and apply it to two real data sets in finance and systems biology. Our experimental results show that the approach leads to valid inferences and non-trivial insights.
Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning
Pajarinen, Joni K., Peltonen, Jaakko
Applications such as robot control and wireless communication require planning under uncertainty. Partially observable Markov decision processes (POMDPs) plan policies for single agents under uncertainty and their decentralized versions (DEC-POMDPs) find a policy for multiple agents. The policy in infinite-horizon POMDP and DEC-POMDP problems has been represented as finite state controllers (FSCs). We introduce a novel class of periodic FSCs, composed of layers connected only to the previous and next layer. Our periodic FSC method finds a deterministic finite-horizon policy and converts it to an initial periodic infinite-horizon policy. This policy is optimized by a new infinite-horizon algorithm to yield deterministic periodic policies, and by a new expectation maximization algorithm to yield stochastic periodic policies. Our method yields better results than earlier planning methods and can compute larger solutions than with regular FSCs.
On Tracking The Partition Function
Desjardins, Guillaume, Bengio, Yoshua, Courville, Aaron C.
Markov Random Fields (MRFs) have proven very powerful both as density estimators and feature extractors for classification. However, their use is often limited by an inability to estimate the partition function $Z$. In this paper, we exploit the gradient descent training procedure of restricted Boltzmann machines (a type of MRF) to {\bf track} the log partition function during learning. Our method relies on two distinct sources of information: (1) estimating the change $\Delta Z$ incurred by each gradient update, (2) estimating the difference in $Z$ over a small set of tempered distributions using bridge sampling. The two sources of information are then combined using an inference procedure similar to Kalman filtering. Learning MRFs through Tempered Stochastic Maximum Likelihood, we can estimate $Z$ using no more temperatures than are required for learning. Comparing to both exact values and estimates using annealed importance sampling (AIS), we show on several datasets that our method is able to accurately track the log partition function. In contrast to AIS, our method provides this estimate at each time-step, at a computational cost similar to that required for training alone.
Quasi-Newton Methods for Markov Chain Monte Carlo
Zhang, Yichuan, Sutton, Charles A.
The performance of Markov chain Monte Carlo methods is often sensitive to the scaling and correlations between the random variables of interest. An important source of information about the local correlation and scale is given by the Hessian matrix of the target distribution, but this is often either computationally expensive or infeasible. In this paper we propose MCMC samplers that make use of quasi-Newton approximations from the optimization literature, that approximate the Hessian of the target distribution from previous samples and gradients generated by the sampler. A key issue is that MCMC samplers that depend on the history of previous states are in general not valid. We address this problem by using limited memory quasi-Newton methods, which depend only on a fixed window of previous samples. On several real world datasets, we show that the quasi-Newton sampler is a more effective sampler than standard Hamiltonian Monte Carlo at a fraction of the cost of MCMC methods that require higher-order derivatives.
Neuronal Adaptation for Sampling-Based Probabilistic Inference in Perceptual Bistability
Reichert, David P., Series, Peggy, Storkey, Amos J.
It has been argued that perceptual multistability reflects probabilistic inference performed by the brain when sensory input is ambiguous. Alternatively, more traditional explanations of multistability refer to low-level mechanisms such as neuronal adaptation. We employ a Deep Boltzmann Machine (DBM) model of cortical processing to demonstrate that these two different approaches can be combined in the same framework. Based on recent developments in machine learning, we show how neuronal adaptation can be understood as a mechanism that improves probabilistic, sampling-based inference. Using the ambiguous Necker cube image, we analyze the perceptual switching exhibited by the model. We also examine the influence of spatial attention, and explore how binocular rivalry can be modeled with the same approach. Our work joins earlier studies in demonstrating how the principles underlying DBMs relate to cortical processing, and offers novel perspectives on the neural implementation of approximate probabilistic inference in the brain.
Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation
Hsieh, Cho-jui, Dhillon, Inderjit S., Ravikumar, Pradeep K., Sustik, Mátyás A.
The L_1 regularized Gaussian maximum likelihood estimator has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix, or alternatively the underlying graph structure of a Gaussian Markov Random Field, from very limited samples. We propose a novel algorithm for solving the resulting optimization problem which is a regularized log-determinant program. In contrast to other state-of-the-art methods that largely use first order gradient information, our algorithm is based on Newton's method and employs a quadratic approximation, but with some modifications that leverage the structure of the sparse Gaussian MLE problem. We show that our method is superlinearly convergent, and also present experimental results using synthetic and real application data that demonstrate the considerable improvements in performance of our method when compared to other state-of-the-art methods.
Priors over Recurrent Continuous Time Processes
Saeedi, Ardavan, Bouchard-côté, Alexandre
We introduce the Gamma-Exponential Process (GEP), a prior over a large family ofcontinuous time stochastic processes. A hierarchical version of this prior (HGEP; the Hierarchical GEP) yields a useful model for analyzing complex time series. Models based on HGEPs display many attractive properties: conjugacy, exchangeability and closed-form predictive distribution for the waiting times, and exact Gibbs updates for the time scale parameters. After establishing these properties, weshow how posterior inference can be carried efficiently using Particle MCMC methods [1]. This yields a MCMC algorithm that can resample entire sequences atomicallywhile avoiding the complications of introducing slice and stick auxiliary variables of the beam sampler [2]. We applied our model to the problem of estimating the disease progression in multiple sclerosis [3], and to RNA evolutionary modeling[4]. In both domains, we found that our model outperformed the standard rate matrix estimation approach.
Spectral Methods for Learning Multivariate Latent Tree Structure
Anandkumar, Animashree, Chaudhuri, Kamalika, Hsu, Daniel J., Kakade, Sham M., Song, Le, Zhang, Tong
This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linear-Gaussian models, hidden Markov models, Gaussian mixture models, and Markov evolutionary trees. The setting is one where we only have samples from certain observed variables in the tree, and our goal is to estimate the tree structure (i.e., the graph of how the underlying hidden variables are connected to each other and to the observed variables). We propose the Spectral Recursive Grouping algorithm, an efficient and simple bottom-up procedure for recovering the tree structure from independent samples of the observed variables. Our finite sample size bounds for exact recovery of the tree structure reveal certain natural dependencies on underlying statistical and structural properties of the underlying joint distribution. Furthermore, our sample complexity guarantees have no explicit dependence on the dimensionality of the observed variables, making the algorithm applicable to many high-dimensional settings. At the heart of our algorithm is a spectral quartet test for determining the relative topology of a quartet of variables from second-order statistics.