Country
A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences
Xing, Eric P., Jordan, Michael I., Karp, Richard M., Russell, Stuart J.
We propose a dynamic Bayesian model for motifs in biopolymer sequences whichcaptures rich biological prior knowledge and positional dependencies in motif structure in a principled way. Our model posits that the position-specific multinomial parameters for monomer distribution aredistributed as a latent Dirichlet-mixture random variable, and the position-specific Dirichlet component is determined by a hidden Markov process. Model parameters can be fit on training motifs using a variational EMalgorithm within an empirical Bayesian framework. Variational inference is also used for detecting hidden motifs. Our model improves overprevious models that ignore biological priors and positional dependence. It has much higher sensitivity to motifs during detection and a notable ability to distinguish genuine motifs from false recurring patterns.
Cluster Kernels for Semi-Supervised Learning
Chapelle, Olivier, Weston, Jason, Schölkopf, Bernhard
One of the first semi-supervised algorithms [1] was applied to web page classification. This is a typical example where the number of unlabeled examples can be made as large as possible since there are billions of web page, but labeling is expensive since it requires human intervention. Since then, there has been a lot of interest for this paradigm in the machine learning community; an extensive review of existing techniques can be found in [10]. It has been shown experimentally that under certain conditions, the decision function canbe estimated more accurately, yielding lower generalization error [1, 4, 6] . However, in a discriminative framework, it is not obvious to determine how unlabeled dataor even the perfect knowledge of the input distribution P(x) can help in the estimation of the decision function.
Exact MAP Estimates by (Hyper)tree Agreement
Wainwright, Martin J., Jaakkola, Tommi S., Willsky, Alan S.
We describe a method for computing provably exact maximum a posteriori (MAP)estimates for a subclass of problems on graphs with cycles. The basic idea is to represent the original problem on the graph with cycles asa convex combination of tree-structured problems. A convexity argument then guarantees that the optimal value of the original problem (i.e., the log probability of the MAP assignment) is upper bounded by the combined optimal values of the tree problems. We prove that this upper bound is met with equality if and only if the tree problems share an optimal configurationin common. An important implication is that any such shared configuration must also be the MAP configuration for the original problem. Next we develop a tree-reweighted max-product algorithm for attempting to find convex combinations of tree-structured problems that share a common optimum. We give necessary and sufficient conditions for a fixed point to yield the exact MAP estimate. An attractive feature of our analysis is that it generalizes naturally to convex combinations of hypertree-structured distributions.
Learning Attractor Landscapes for Learning Motor Primitives
Ijspeert, Auke J., Nakanishi, Jun, Schaal, Stefan
Many control problems take place in continuous state-action spaces, e.g., as in manipulator robotics, where the control objective is often definedas finding a desired trajectory that reaches a particular goal state. While reinforcement learning offers a theoretical framework tolearn such control policies from scratch, its applicability to higher dimensional continuous state-action spaces remains rather limited to date. Instead of learning from scratch, in this paper we suggest to learn a desired complex control policy by transforming an existing simple canonical control policy. For this purpose, we represent canonical policies in terms of differential equations with well-defined attractor properties. By nonlinearly transforming the canonical attractor dynamics using techniques from nonparametric regression, almost arbitrary new nonlinear policies can be generated withoutlosing the stability properties of the canonical system.
Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement
Wolfe, Patrick J., Godsill, Simon J.
The Bayesian paradigm provides a natural and effective means of exploiting priorknowledge concerning the time-frequency structure of sound signals such as speech and music--something which has often been overlooked intraditional audio signal processing approaches. Here, after constructing aBayesian model and prior distributions capable of taking into account the time-frequency characteristics of typical audio waveforms, we apply Markov chain Monte Carlo methods in order to sample from the resultant posterior distribution of interest. We present speech enhancement resultswhich compare favourably in objective terms with standard time-varying filtering techniques (and in several cases yield superior performance, bothobjectively and subjectively); moreover, in contrast to such methods, our results are obtained without an assumption of prior knowledge of the noise power.
Exponential Family PCA for Belief Compression in POMDPs
Roy, Nicholas, Gordon, Geoffrey J.
Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are intractable for large models. The intractability ofthese algorithms is due to a great extent to their generating an optimal policy over the entire belief space. However, in real POMDP problems most belief states are unlikely, and there is a structured, low-dimensional manifold of plausible beliefs embedded in the high-dimensional belief space. We introduce a new method for solving large-scale POMDPs by taking advantage of belief space sparsity. We reduce the dimensionality of the belief space by exponential family Principal Components Analysis [1], which allows us to turn the sparse, highdimensional beliefspace into a compact, low-dimensional representation in terms of learned features of the belief state. We then plan directly on the low-dimensional belief features. By planning in a low-dimensional space, we can find policies for POMDPs that are orders of magnitude larger than can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and also on a mobile robot navigation task.
Real-Time Particle Filters
Kwok, Cody, Fox, Dieter, Meila, Marina
Particle filters estimate the state of dynamical systems from sensor information. Inmany real time applications of particle filters, however, sensor information arrives at a significantly higher rate than the update rate of the filter. The prevalent approach to dealing with such situations is to update the particle filter as often as possible and to discard sensor information that cannot be processed in time. In this paper we present real-time particle filters, whichmake use of all sensor information even when the filter update rate is below the update rate of the sensors. This is achieved by representing posteriorsas mixtures of sample sets, where each mixture component integrates one observation arriving during a filter update. The weights of the mixture components are set so as to minimize the approximation error introduced by the mixture representation. Thereby, our approach focuses computational resources (samples) on valuable sensor information. Experiments usingdata collected with a mobile robot show that our approach yields strong improvements over other approaches.
On the Dirichlet Prior and Bayesian Regularization
Steck, Harald, Jaakkola, Tommi S.
In the Bayesian approach, regularizationis achieved by specifying a prior distribution over the parameters and subsequently averaging over the posterior distribution. This regularization provides not only smoother estimates of the parameters compared to maximum likelihood but also guides the selection of model structures. It was pointed out in [6] that a very large scale parameter of the Dirichlet prior can degrade predictive accuracy due to severe regularization of the parameter estimates. We complement this discussion here and show that a very small scale parameter can lead to poor over-regularized structures when a product of (conjugate) Dirichlet priors is used over multinomial conditional distributions (Section 3). Section 4 demonstrates the effect of the scale parameter and how it can be calibrated. We focus on the class of Bayesian network models throughout this paper.
Replay, Repair and Consolidation
A standard view of memory consolidation is that episodes are stored temporarily inthe hippocampus, and are transferred to the neocortex through replay. Various recent experimental challenges to the idea of transfer, particularly for human memory, are forcing its reevaluation. However, although there is independent neurophysiological evidence for replay, short of transfer, there are few theoretical ideas for what it might be doing. We suggest and demonstrate two important computational roles associated with neocortical indices.
Fast Sparse Gaussian Process Methods: The Informative Vector Machine
Herbrich, Ralf, Lawrence, Neil D., Seeger, Matthias
We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on informationtheoretic principles,previously suggested for active learning. Our goal is not only to learn d-sparse predictors (which can be evaluated inO(d) rather than O(n), d n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements.