Learning Graphical Models
Handling Missing Data with Variational Bayesian Learning of ICA
Chan, Kwokleung, Lee, Te-Won, Sejnowski, Terrence J.
Modeling the distributions of the independent sources with mixture of Gaussians allows sources to be estimated with different kurtosis and skewness. The variational Bayesian method automatically determines the dimensionality of the data and yields an accurate density model for the observed data without overfitting problems.
Using Tarjan's Red Rule for Fast Dependency Tree Construction
We focus on the problem of efficient learning of dependency trees. It is well-known that given the pairwise mutual information coefficients, a minimum-weight spanning tree algorithm solves this problem exactly and in polynomial time. However, for large data-sets it is the construction ofthe correlation matrix that dominates the running time. We have developed a new spanning-tree algorithm which is capable of exploiting partial knowledge about edge weights. The partial knowledge we maintain isa probabilistic confidence interval on the coefficients, which we derive by examining just a small sample of the data. The algorithm is able to flag the need to shrink an interval, which translates to inspection ofmore data for the particular attribute pair. Experimental results show running time that is near-constant in the number of records, without significantloss in accuracy of the generated trees. Interestingly, our spanning-tree algorithm is based solely on Tarjan's red-edge rule, which is generally considered a guaranteed recipe for bad performance.
Timing and Partial Observability in the Dopamine System
Daw, Nathaniel D., Courville, Aaron C., Touretzky, David S.
According to a series of influential models, dopamine (DA) neurons signal rewardprediction error using a temporal-difference (TD) algorithm. We address a problem not convincingly solved in these accounts: how to maintain a representation of cues that predict delayed consequences. Our new model uses a TD rule grounded in partially observable semi-Markov processes, a formalism that captures two largely neglected features of DA experiments: hidden state and temporal variability. Previous models predicted rewardsusing a tapped delay line representation of sensory inputs; we replace this with a more active process of inference about the underlying stateof the world. The DA system can then learn to map these inferred states to reward predictions using TD. The new model can explain previouslyvexing data on the responses of DA neurons in the face of temporal variability. By combining statistical model-based learning with a physiologically grounded TD theory, it also brings into contact with physiology some insights about behavior that had previously been confined to more abstract psychological models.
Dynamical Causal Learning
Danks, David, Griffiths, Thomas L., Tenenbaum, Joshua B.
This paper focuses on people's short-run behavior by examining dynamical versions of these three theories, and comparing their predictions to a real-world dataset. 1 Introduction Currently active quantitative models of human causal judgment for single (and sometimes multiple) causes include conditional
Application of Variational Bayesian Approach to Speech Recognition
Watanabe, Shinji, Minami, Yasuhiro, Nakamura, Atsushi, Ueda, Naonori
Application of V ariational Bayesian Approach to Speech Recognition Shinji Watanabe, Y asuhiro Minami, Atsushi Nakamura and Naonori Ueda NTT Communication Science Laboratories, NTT Corporation 2-4, Hikaridai, Seika-cho, Soraku-gun, Kyoto, Japan {watanabe,minami,ats,ueda}@cslab.kecl.ntt.co.jp Abstract In this paper, we propose a Bayesian framework, which constructs shared-state triphone HMMs based on a variational Bayesian approach, and recognizes speech based on the Bayesian prediction classification; variational Bayesian estimation and clustering for speech recognition (VBEC). An appropriate model structure with high recognition performance can be found within a VBEC framework. Unlike conventional methods, including BIC or MDL criterion based on the maximum likelihood approach, the proposed model selection is valid in principle, even when there are insufficient amounts of data, because it does not use an asymptotic assumption. In isolated word recognition experiments, we show the advantage of VBEC over conventional methods, especially when dealing with small amounts of data. 1 Introduction A statistical modeling of spectral features of speech (acoustic modeling) is one of the most crucial parts in the speech recognition. In acoustic modeling, a triphone-based hidden Markov model (triphone HMM) has been widely employed.
Discriminative Densities from Maximum Contrast Estimation
Meinicke, Peter, Twellmann, Thorsten, Ritter, Helge
We propose a framework for classifier design based on discriminative densities for representation of the differences of the class-conditional distributions ina way that is optimal for classification. The densities are selected from a parametrized set by constrained maximization of some objective function which measures the average (bounded) difference, i.e. the contrast between discriminative densities. We show that maximization ofthe contrast is equivalent to minimization of an approximation of the Bayes risk.
A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences
Xing, Eric P., Jordan, Michael I., Karp, Richard M., Russell, Stuart J.
We propose a dynamic Bayesian model for motifs in biopolymer sequences whichcaptures rich biological prior knowledge and positional dependencies in motif structure in a principled way. Our model posits that the position-specific multinomial parameters for monomer distribution aredistributed as a latent Dirichlet-mixture random variable, and the position-specific Dirichlet component is determined by a hidden Markov process. Model parameters can be fit on training motifs using a variational EMalgorithm within an empirical Bayesian framework. Variational inference is also used for detecting hidden motifs. Our model improves overprevious models that ignore biological priors and positional dependence. It has much higher sensitivity to motifs during detection and a notable ability to distinguish genuine motifs from false recurring patterns.
Exact MAP Estimates by (Hyper)tree Agreement
Wainwright, Martin J., Jaakkola, Tommi S., Willsky, Alan S.
We describe a method for computing provably exact maximum a posteriori (MAP)estimates for a subclass of problems on graphs with cycles. The basic idea is to represent the original problem on the graph with cycles asa convex combination of tree-structured problems. A convexity argument then guarantees that the optimal value of the original problem (i.e., the log probability of the MAP assignment) is upper bounded by the combined optimal values of the tree problems. We prove that this upper bound is met with equality if and only if the tree problems share an optimal configurationin common. An important implication is that any such shared configuration must also be the MAP configuration for the original problem. Next we develop a tree-reweighted max-product algorithm for attempting to find convex combinations of tree-structured problems that share a common optimum. We give necessary and sufficient conditions for a fixed point to yield the exact MAP estimate. An attractive feature of our analysis is that it generalizes naturally to convex combinations of hypertree-structured distributions.
Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement
Wolfe, Patrick J., Godsill, Simon J.
The Bayesian paradigm provides a natural and effective means of exploiting priorknowledge concerning the time-frequency structure of sound signals such as speech and music--something which has often been overlooked intraditional audio signal processing approaches. Here, after constructing aBayesian model and prior distributions capable of taking into account the time-frequency characteristics of typical audio waveforms, we apply Markov chain Monte Carlo methods in order to sample from the resultant posterior distribution of interest. We present speech enhancement resultswhich compare favourably in objective terms with standard time-varying filtering techniques (and in several cases yield superior performance, bothobjectively and subjectively); moreover, in contrast to such methods, our results are obtained without an assumption of prior knowledge of the noise power.
Exponential Family PCA for Belief Compression in POMDPs
Roy, Nicholas, Gordon, Geoffrey J.
Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are intractable for large models. The intractability ofthese algorithms is due to a great extent to their generating an optimal policy over the entire belief space. However, in real POMDP problems most belief states are unlikely, and there is a structured, low-dimensional manifold of plausible beliefs embedded in the high-dimensional belief space. We introduce a new method for solving large-scale POMDPs by taking advantage of belief space sparsity. We reduce the dimensionality of the belief space by exponential family Principal Components Analysis [1], which allows us to turn the sparse, highdimensional beliefspace into a compact, low-dimensional representation in terms of learned features of the belief state. We then plan directly on the low-dimensional belief features. By planning in a low-dimensional space, we can find policies for POMDPs that are orders of magnitude larger than can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and also on a mobile robot navigation task.