Bayesian Inference
Handling Missing Data with Variational Bayesian Learning of ICA
Chan, Kwokleung, Lee, Te-Won, Sejnowski, Terrence J.
Modeling the distributions of the independent sources with mixture of Gaussians allows sources to be estimated with different kurtosis and skewness. The variational Bayesian method automatically determines the dimensionality of the data and yields an accurate density model for the observed data without overfitting problems.
Dynamical Causal Learning
Danks, David, Griffiths, Thomas L., Tenenbaum, Joshua B.
This paper focuses on people's short-run behavior by examining dynamical versions of these three theories, and comparing their predictions to a real-world dataset. 1 Introduction Currently active quantitative models of human causal judgment for single (and sometimes multiple) causes include conditional
Application of Variational Bayesian Approach to Speech Recognition
Watanabe, Shinji, Minami, Yasuhiro, Nakamura, Atsushi, Ueda, Naonori
Application of V ariational Bayesian Approach to Speech Recognition Shinji Watanabe, Y asuhiro Minami, Atsushi Nakamura and Naonori Ueda NTT Communication Science Laboratories, NTT Corporation 2-4, Hikaridai, Seika-cho, Soraku-gun, Kyoto, Japan {watanabe,minami,ats,ueda}@cslab.kecl.ntt.co.jp Abstract In this paper, we propose a Bayesian framework, which constructs shared-state triphone HMMs based on a variational Bayesian approach, and recognizes speech based on the Bayesian prediction classification; variational Bayesian estimation and clustering for speech recognition (VBEC). An appropriate model structure with high recognition performance can be found within a VBEC framework. Unlike conventional methods, including BIC or MDL criterion based on the maximum likelihood approach, the proposed model selection is valid in principle, even when there are insufficient amounts of data, because it does not use an asymptotic assumption. In isolated word recognition experiments, we show the advantage of VBEC over conventional methods, especially when dealing with small amounts of data. 1 Introduction A statistical modeling of spectral features of speech (acoustic modeling) is one of the most crucial parts in the speech recognition. In acoustic modeling, a triphone-based hidden Markov model (triphone HMM) has been widely employed.
Discriminative Densities from Maximum Contrast Estimation
Meinicke, Peter, Twellmann, Thorsten, Ritter, Helge
We propose a framework for classifier design based on discriminative densities for representation of the differences of the class-conditional distributions ina way that is optimal for classification. The densities are selected from a parametrized set by constrained maximization of some objective function which measures the average (bounded) difference, i.e. the contrast between discriminative densities. We show that maximization ofthe contrast is equivalent to minimization of an approximation of the Bayes risk.
A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences
Xing, Eric P., Jordan, Michael I., Karp, Richard M., Russell, Stuart J.
We propose a dynamic Bayesian model for motifs in biopolymer sequences whichcaptures rich biological prior knowledge and positional dependencies in motif structure in a principled way. Our model posits that the position-specific multinomial parameters for monomer distribution aredistributed as a latent Dirichlet-mixture random variable, and the position-specific Dirichlet component is determined by a hidden Markov process. Model parameters can be fit on training motifs using a variational EMalgorithm within an empirical Bayesian framework. Variational inference is also used for detecting hidden motifs. Our model improves overprevious models that ignore biological priors and positional dependence. It has much higher sensitivity to motifs during detection and a notable ability to distinguish genuine motifs from false recurring patterns.
Exact MAP Estimates by (Hyper)tree Agreement
Wainwright, Martin J., Jaakkola, Tommi S., Willsky, Alan S.
We describe a method for computing provably exact maximum a posteriori (MAP)estimates for a subclass of problems on graphs with cycles. The basic idea is to represent the original problem on the graph with cycles asa convex combination of tree-structured problems. A convexity argument then guarantees that the optimal value of the original problem (i.e., the log probability of the MAP assignment) is upper bounded by the combined optimal values of the tree problems. We prove that this upper bound is met with equality if and only if the tree problems share an optimal configurationin common. An important implication is that any such shared configuration must also be the MAP configuration for the original problem. Next we develop a tree-reweighted max-product algorithm for attempting to find convex combinations of tree-structured problems that share a common optimum. We give necessary and sufficient conditions for a fixed point to yield the exact MAP estimate. An attractive feature of our analysis is that it generalizes naturally to convex combinations of hypertree-structured distributions.
Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement
Wolfe, Patrick J., Godsill, Simon J.
The Bayesian paradigm provides a natural and effective means of exploiting priorknowledge concerning the time-frequency structure of sound signals such as speech and music--something which has often been overlooked intraditional audio signal processing approaches. Here, after constructing aBayesian model and prior distributions capable of taking into account the time-frequency characteristics of typical audio waveforms, we apply Markov chain Monte Carlo methods in order to sample from the resultant posterior distribution of interest. We present speech enhancement resultswhich compare favourably in objective terms with standard time-varying filtering techniques (and in several cases yield superior performance, bothobjectively and subjectively); moreover, in contrast to such methods, our results are obtained without an assumption of prior knowledge of the noise power.
On the Dirichlet Prior and Bayesian Regularization
Steck, Harald, Jaakkola, Tommi S.
In the Bayesian approach, regularizationis achieved by specifying a prior distribution over the parameters and subsequently averaging over the posterior distribution. This regularization provides not only smoother estimates of the parameters compared to maximum likelihood but also guides the selection of model structures. It was pointed out in [6] that a very large scale parameter of the Dirichlet prior can degrade predictive accuracy due to severe regularization of the parameter estimates. We complement this discussion here and show that a very small scale parameter can lead to poor over-regularized structures when a product of (conjugate) Dirichlet priors is used over multinomial conditional distributions (Section 3). Section 4 demonstrates the effect of the scale parameter and how it can be calibrated. We focus on the class of Bayesian network models throughout this paper.
Fast Sparse Gaussian Process Methods: The Informative Vector Machine
Herbrich, Ralf, Lawrence, Neil D., Seeger, Matthias
We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on informationtheoretic principles,previously suggested for active learning. Our goal is not only to learn d-sparse predictors (which can be evaluated inO(d) rather than O(n), d n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements.
Automatic Derivation of Statistical Algorithms: The EM Family and Beyond
Fischer, Bernd, Schumann, Johann, Buntine, Wray, Gray, Alexander G.
Machine learning has reached a point where many probabilistic methods canbe understood as variations, extensions and combinations of a much smaller set of abstract themes, e.g., as different instances of the EM algorithm. This enables the systematic derivation of algorithms customized fordifferent models. Here, we describe the AUTOBAYES system which takes a high-level statistical model specification, uses powerful symbolic techniques based on schema-based program synthesis and computer algebra to derive an efficient specialized algorithm for learning that model, and generates executable code implementing that algorithm. This capability is far beyond that of code collections such as Matlab toolboxes oreven tools for model-independent optimization such as BUGS for Gibbs sampling: complex new algorithms can be generated without newprogramming, algorithms can be highly specialized and tightly crafted for the exact structure of the model and data, and efficient and commented code can be generated for different languages or systems.