Goto

Collaborating Authors

 Bayesian Inference


Exact MAP Estimates by (Hyper)tree Agreement

Neural Information Processing Systems

We describe a method for computing provably exact maximum a poste- riori (MAP) estimates for a subclass of problems on graphs with cycles. The basic idea is to represent the original problem on the graph with cy- cles as a convex combination of tree-structured problems. A convexity argument then guarantees that the optimal value of the original problem (i.e., the log probability of the MAP assignment) is upper bounded by the combined optimal values of the tree problems. We prove that this upper bound is met with equality if and only if the tree problems share an opti- mal configuration in common. An important implication is that any such shared configuration must also be the MAP configuration for the original problem.


Maximum Likelihood and the Information Bottleneck

Neural Information Processing Systems

The information bottleneck (IB) method is an information-theoretic formulation, this method constructs for clustering problems. Given a joint distribution a new variable that are informative . Maximum likelihood (ML) of mixture models is a standard statistical about approach to clustering problems. In this paper, we ask: how are the two methods related? We define a simple mapping between the IB problem and the ML prob- lem for the multinomial mixture model.


A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences

Neural Information Processing Systems

We propose a dynamic Bayesian model for motifs in biopolymer se- quences which captures rich biological prior knowledge and positional dependencies in motif structure in a principled way. Our model posits that the position-specific multinomial parameters for monomer distribu- tion are distributed as a latent Dirichlet-mixture random variable, and the position-specific Dirichlet component is determined by a hidden Markov process. Model parameters can be fit on training motifs using a vari- ational EM algorithm within an empirical Bayesian framework. Varia- tional inference is also used for detecting hidden motifs. Our model im- proves over previous models that ignore biological priors and positional dependence.


Application of Variational Bayesian Approach to Speech Recognition

Neural Information Processing Systems

In this paper, we propose a Bayesian framework, which constructs shared-state triphone HMMs based on a variational Bayesian approach, and recognizes speech based on the Bayesian prediction classi(cid:2)cation; variational Bayesian estimation and clustering for speech recognition (VBEC). An appropriate model structure with high recognition perfor- mance can be found within a VBEC framework. Unlike conventional methods, including BIC or MDL criterion based on the maximum likeli- hood approach, the proposed model selection is valid in principle, even when there are insuf(cid:2)cient amounts of data, because it does not use an asymptotic assumption. In isolated word recognition experiments, we show the advantage of VBEC over conventional methods, especially when dealing with small amounts of data.


VIBES: A Variational Inference Engine for Bayesian Networks

Neural Information Processing Systems

In recent years variational methods have become a popular tool for approximate inference and learning in a wide variety of proba- bilistic models. For each new application, however, it is currently necessary (cid:12)rst to derive the variational update equations, and then to implement them in application-speci(cid:12)c code. Each of these steps is both time consuming and error prone. In this paper we describe a general purpose inference engine called VIBES ('Variational Infer- ence for Bayesian Networks') which allows a wide variety of proba- bilistic models to be implemented and solved variationally without recourse to coding. New models are speci(cid:12)ed either through a simple script or via a graphical interface analogous to a drawing package.


Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement

Neural Information Processing Systems

The Bayesian paradigm provides a natural and effective means of exploit- ing prior knowledge concerning the time-frequency structure of sound signals such as speech and music--something which has often been over- looked in traditional audio signal processing approaches. Here, after con- structing a Bayesian model and prior distributions capable of taking into account the time-frequency characteristics of typical audio waveforms, we apply Markov chain Monte Carlo methods in order to sample from the resultant posterior distribution of interest. We present speech enhance- ment results which compare favourably in objective terms with standard time-varying filtering techniques (and in several cases yield superior per- formance, both objectively and subjectively); moreover, in contrast to such methods, our results are obtained without an assumption of prior knowledge of the noise power.


Bayesian Models of Inductive Generalization

Neural Information Processing Systems

We argue that human inductive generalization is best explained in a Bayesian framework, rather than by traditional models based on simi- larity computations. We go beyond previous work on Bayesian concept learning by introducing an unsupervised method for constructing flex- ible hypothesis spaces, and we propose a version of the Bayesian Oc- cam's razor that trades off priors and likelihoods to prevent under- or over-generalization in these flexible spaces. We analyze two published data sets on inductive reasoning as well as the results of a new behavioral study that we have carried out.


Incremental Gaussian Processes

Neural Information Processing Systems

In this paper, we consider Tipping's relevance vector machine (RVM) [1] and formalize an incremental training strategy as a variant of the expectation-maximization (EM) algorithm that we call Subspace EM (SSEM). Working with a subset of active basis functions, the sparsity of the RVM solution will ensure that the number of basis functions and thereby the computational complexity is kept low. We also introduce a mean field approach to the intractable classification model that is ex- pected to give a very good approximation to exact Bayesian inference and contains the Laplace approximation as a special case. We test the algorithms on two large data sets with O(103 (cid:0) 104) examples. The re- sults indicate that Bayesian learning of large data sets, e.g. the MNIST database is realistic.


Handling Missing Data with Variational Bayesian Learning of ICA

Neural Information Processing Systems

Missing data is common in real-world datasets and is a problem for many estimation techniques. We have developed a variational Bayesian method to perform Independent Component Analysis (ICA) on high-dimensional data containing missing entries. Missing data are handled naturally in the Bayesian framework by integrating the generative density model. Mod- eling the distributions of the independent sources with mixture of Gaus- sians allows sources to be estimated with different kurtosis and skewness. The variational Bayesian method automatically determines the dimen- sionality of the data and yields an accurate density model for the ob- served data without overfitting problems.


Evidence Optimization Techniques for Estimating Stimulus-Response Functions

Neural Information Processing Systems

An essential step in understanding the function of sensory nervous sys- tems is to characterize as accurately as possible the stimulus-response function (SRF) of the neurons that relay and process sensory informa- tion. One increasingly common experimental approach is to present a rapidly varying complex stimulus to the animal while recording the re- sponses of one or more neurons, and then to directly estimate a func- tional transformation of the input that accounts for the neuronal firing. The estimation techniques usually employed, such as Wiener filtering or other correlation-based estimation of the Wiener or Volterra kernels, are equivalent to maximum likelihood estimation in a Gaussian-output-noise regression model. We explore the use of Bayesian evidence-optimization techniques to condition these estimates. We show that by learning hyper- parameters that control the smoothness and sparsity of the transfer func- tion it is possible to improve dramatically the quality of SRF estimates, as measured by their success in predicting responses to novel input.