Markov Models
Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path
Hsu, Daniel J., Kontorovich, Aryeh, Szepesvari, Csaba
This article provides the first procedure for computing a fully data-dependent interval that traps the mixing time $t_{mix}$ of a finite reversible ergodic Markov chain at a prescribed confidence level. The interval is computed from a single finite-length sample path from the Markov chain, and does not require the knowledge of any parameters of the chain. This stands in contrast to previous approaches, which either only provide point estimates, or require a reset mechanism, or additional prior knowledge. The interval is constructed around the relaxation time $t_{relax}$, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a $\sqrt{n}$ rate, where $n$ is the length of the sample path. Upper and lower bounds are given on the number of samples required to achieve constant-factor multiplicative accuracy. The lower bounds indicate that, unless further restrictions are placed on the chain, no procedure can achieve this accuracy level before seeing each state at least $\Omega(t_{relax})$ times on the average. Finally, future directions of research are identified.
Fast Bidirectional Probability Estimation in Markov Models
Banerjee, Siddhartha, Lofgren, Peter
We develop a new bidirectional algorithm for estimating Markov chain multi-step transition probabilities: given a Markov chain, we want to estimate the probability of hitting a given target state in $\ell$ steps after starting from a given source distribution. Given the target state $t$, we use a (reverse) local power iteration to construct an `expanded target distribution', which has the same mean as the quantity we want to estimate, but a smaller variance -- this can then be sampled efficiently by a Monte Carlo algorithm. Our method extends to any Markov chain on a discrete (finite or countable) state-space, and can be extended to compute functions of multi-step transition probabilities such as PageRank, graph diffusions, hitting/return times, etc. Our main result is that in `sparse' Markov Chains -- wherein the number of transitions between states is comparable to the number of states -- the running time of our algorithm for a uniform-random target node is order-wise smaller than Monte Carlo and power iteration based algorithms; in particular, our method can estimate a probability $p$ using only $O(1/\sqrt{p})$ running time.
Lifted Symmetry Detection and Breaking for MAP Inference
Kopp, Timothy, Singla, Parag, Kautz, Henry
Symmetry breaking is a technique for speeding up propositional satisfiability testing by adding constraints to the theory that restrict the search space while preserving satisfiability. In this work, we extend symmetry breaking to the problem of model finding in weighted and unweighted relational theories, a class of problems that includes MAP inference in Markov Logic and similar statistical-relational languages. We introduce term symmetries, which are induced by an evidence set and extend to symmetries over a relational theory. We provide the important special case of term equivalent symmetries, showing that such symmetries can be found in low-degree polynomial time. We show how to break an exponential number of these symmetries with added constraints whose number is linear in the size of the domain. We demonstrate the effectiveness of these techniques through experiments in two relational domains. We also discuss the connections between relational symmetry breaking and work on lifted inference in statistical-relational reasoning.
Scalable Adaptation of State Complexity for Nonparametric Hidden Markov Models
Hughes, Michael C., Stephenson, William T., Sudderth, Erik
Bayesian nonparametric hidden Markov models are typically learned via fixed truncations of the infinite state space or local Monte Carlo proposals that make small changes to the state space. We develop an inference algorithm for the sticky hierarchical Dirichlet process hidden Markov model that scales to big datasets by processing a few sequences at a time yet allows rapid adaptation of the state space cardinality. Unlike previous point-estimate methods, our novel variational bound penalizes redundant or irrelevant states and thus enables optimization of the state space. Our birth proposals use observed data statistics to create useful new states that escape local optima. Merge and delete proposals remove ineffective states to yield simpler models with more affordable future computations. Experiments on speaker diarization, motion capture, and epigenetic chromatin datasets discover models that are more compact, more interpretable, and better aligned to ground truth segmentations than competitors. We have released an open-source Python implementation which can parallelize local inference steps across sequences.
Training Restricted Boltzmann Machine via the ๏ฟผThouless-Anderson-Palmer free energy
Gabrie, Marylou, Tramel, Eric W., Krzakala, Florent
Restricted Boltzmann machines are undirected neural networks which have been shown tobe effective in many applications, including serving as initializations fortraining deep multi-layer neural networks. One of the main reasons for their success is theexistence of efficient and practical stochastic algorithms, such as contrastive divergence,for unsupervised training. We propose an alternative deterministic iterative procedure based on an improved mean field method from statistical physics known as the Thouless-Anderson-Palmer approach. We demonstrate that our algorithm provides performance equal to, and sometimes superior to, persistent contrastive divergence, while also providing a clear and easy to evaluate objective function. We believe that this strategycan be easily generalized to other models as well as to more accurate higher-order approximations, paving the way for systematic improvements in training Boltzmann machineswith hidden units.
Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling
Shang, Xiaocheng, Zhu, Zhanxing, Leimkuhler, Benedict, Storkey, Amos J.
Monte Carlo sampling for Bayesian posterior inference is a common approach used in machine learning. The Markov Chain Monte Carlo procedures that are used are often discrete-time analogues of associated stochastic differential equations (SDEs). These SDEs are guaranteed to leave invariant the required posterior distribution. An area of current research addresses the computational benefits of stochastic gradient methods in this setting. Existing techniques rely on estimating the variance or covariance of the subsampling error, and typically assume constant variance. In this article, we propose a covariance-controlled adaptive Langevin thermostat that can effectively dissipate parameter-dependent noise while maintaining a desired target distribution. The proposed method achieves a substantial speedup over popular alternative schemes for large-scale machine learning applications.
Deep Knowledge Tracing
Piech, Chris, Bassen, Jonathan, Huang, Jonathan, Ganguli, Surya, Sahami, Mehran, Guibas, Leonidas J., Sohl-Dickstein, Jascha
Knowledge tracing, where a machine models the knowledge of a student as they interact with coursework, is an established and significantly unsolved problem in computer supported education.In this paper we explore the benefit of using recurrent neural networks to model student learning.This family of models have important advantages over current state of the art methods in that they do not require the explicit encoding of human domain knowledge,and have a far more flexible functional form which can capture substantially more complex student interactions.We show that these neural networks outperform the current state of the art in prediction on real student data,while allowing straightforward interpretation and discovery of structure in the curriculum.These results suggest a promising new line of research for knowledge tracing.
Individual Planning in Infinite-Horizon Multiagent Settings: Inference, Structure and Scalability
This paper provides the first formalization of self-interested planning in multiagent settings using expectation-maximization (EM). Our formalization in the context of infinite-horizon and finitely-nested interactive POMDPs (I-POMDP) is distinct from EM formulations for POMDPs and cooperative multiagent planning frameworks. We exploit the graphical model structure specific to I-POMDPs, and present a new approach based on block-coordinate descent for further speed up. Forward filtering-backward sampling -- a combination of exact filtering with sampling -- is explored to exploit problem structure.
Spectral Learning of Large Structured HMMs for Comparative Epigenomics
Zhang, Chicheng, Song, Jimin, Chaudhuri, Kamalika, Chen, Kevin
We develop a latent variable model and an efficient spectral algorithm motivated by the recent emergence of very large data sets of chromatin marks from multiple human cell types. A natural model for chromatin data in one cell type is a Hidden Markov Model (HMM); we model the relationship between multiple cell types by connecting their hidden states by a fixed tree of known structure. The main challenge with learning parameters of such models is that iterative methods such as EM are very slow, while naive spectral methods result in time and space complexity exponential in the number of cell types. We exploit properties of the tree structure of the hidden states to provide spectral algorithms that are more computationally efficient for current biological datasets. We provide sample complexity bounds for our algorithm and evaluate it experimentally on biological data from nine human cell types. Finally, we show that beyond our specific model, some of our algorithmic ideas can be applied to other graphical models.
Measuring Sample Quality with Stein's Method
Gorham, Jackson, Mackey, Lester
To improve the efficiency of Monte Carlo estimation, practitioners are turning to biased Markov chain Monte Carlo procedures that trade off asymptotic exactness for computational speed. The reasoning is sound: a reduction in variance due to more rapid sampling can outweigh the bias introduced. However, the inexactness creates new challenges for sampler and parameter selection, since standard measures of sample quality like effective sample size do not account for asymptotic bias. To address these challenges, we introduce a new computable quality measure based on Stein's method that bounds the discrepancy between sample and target expectations over a large class of test functions. We use our tool to compare exact, biased, and deterministic sample sequences and illustrate applications to hyperparameter selection, convergence rate assessment, and quantifying bias-variance tradeoffs in posterior inference.