Goto

Collaborating Authors

 Uncertainty


Identity Uncertainty and Citation Matching

Neural Information Processing Systems

Identity uncertainty is a pervasive problem in real-world data analysis. It arises whenever objects are not labeled with unique identifiers or when those identifiers may not be perceived perfectly. In such cases, two observations mayor may not correspond to the same object. In this paper, we consider the problem in the context of citation matching--the problem ofdeciding which citations correspond to the same publication. Our approach is based on the use of a relational probability model to define a generative model for the domain, including models of author and title corruption and a probabilistic citation grammar. Identity uncertainty is handled by extending standard models to incorporate probabilities over the possible mappings between terms in the language and objects in the domain. Inference is based on Markov chain Monte Carlo, augmented with specific methods for generating efficient proposals when the domain contains many objects. Results on several citation data sets show that the method outperforms current algorithms for citation matching. The declarative, relational nature of the model also means that our algorithm can determine object characteristics such as author names by combining multiple citations of multiple papers.


Handling Missing Data with Variational Bayesian Learning of ICA

Neural Information Processing Systems

Modeling the distributions of the independent sources with mixture of Gaussians allows sources to be estimated with different kurtosis and skewness. The variational Bayesian method automatically determines the dimensionality of the data and yields an accurate density model for the observed data without overfitting problems.


Using Tarjan's Red Rule for Fast Dependency Tree Construction

Neural Information Processing Systems

We focus on the problem of efficient learning of dependency trees. It is well-known that given the pairwise mutual information coefficients, a minimum-weight spanning tree algorithm solves this problem exactly and in polynomial time. However, for large data-sets it is the construction ofthe correlation matrix that dominates the running time. We have developed a new spanning-tree algorithm which is capable of exploiting partial knowledge about edge weights. The partial knowledge we maintain isa probabilistic confidence interval on the coefficients, which we derive by examining just a small sample of the data. The algorithm is able to flag the need to shrink an interval, which translates to inspection ofmore data for the particular attribute pair. Experimental results show running time that is near-constant in the number of records, without significantloss in accuracy of the generated trees. Interestingly, our spanning-tree algorithm is based solely on Tarjan's red-edge rule, which is generally considered a guaranteed recipe for bad performance.


Dynamical Causal Learning

Neural Information Processing Systems

This paper focuses on people's short-run behavior by examining dynamical versions of these three theories, and comparing their predictions to a real-world dataset. 1 Introduction Currently active quantitative models of human causal judgment for single (and sometimes multiple) causes include conditional


Application of Variational Bayesian Approach to Speech Recognition

Neural Information Processing Systems

Application of V ariational Bayesian Approach to Speech Recognition Shinji Watanabe, Y asuhiro Minami, Atsushi Nakamura and Naonori Ueda NTT Communication Science Laboratories, NTT Corporation 2-4, Hikaridai, Seika-cho, Soraku-gun, Kyoto, Japan {watanabe,minami,ats,ueda}@cslab.kecl.ntt.co.jp Abstract In this paper, we propose a Bayesian framework, which constructs shared-state triphone HMMs based on a variational Bayesian approach, and recognizes speech based on the Bayesian prediction classification; variational Bayesian estimation and clustering for speech recognition (VBEC). An appropriate model structure with high recognition performance can be found within a VBEC framework. Unlike conventional methods, including BIC or MDL criterion based on the maximum likelihood approach, the proposed model selection is valid in principle, even when there are insufficient amounts of data, because it does not use an asymptotic assumption. In isolated word recognition experiments, we show the advantage of VBEC over conventional methods, especially when dealing with small amounts of data. 1 Introduction A statistical modeling of spectral features of speech (acoustic modeling) is one of the most crucial parts in the speech recognition. In acoustic modeling, a triphone-based hidden Markov model (triphone HMM) has been widely employed.


Discriminative Densities from Maximum Contrast Estimation

Neural Information Processing Systems

We propose a framework for classifier design based on discriminative densities for representation of the differences of the class-conditional distributions ina way that is optimal for classification. The densities are selected from a parametrized set by constrained maximization of some objective function which measures the average (bounded) difference, i.e. the contrast between discriminative densities. We show that maximization ofthe contrast is equivalent to minimization of an approximation of the Bayes risk.


A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences

Neural Information Processing Systems

We propose a dynamic Bayesian model for motifs in biopolymer sequences whichcaptures rich biological prior knowledge and positional dependencies in motif structure in a principled way. Our model posits that the position-specific multinomial parameters for monomer distribution aredistributed as a latent Dirichlet-mixture random variable, and the position-specific Dirichlet component is determined by a hidden Markov process. Model parameters can be fit on training motifs using a variational EMalgorithm within an empirical Bayesian framework. Variational inference is also used for detecting hidden motifs. Our model improves overprevious models that ignore biological priors and positional dependence. It has much higher sensitivity to motifs during detection and a notable ability to distinguish genuine motifs from false recurring patterns.


Exact MAP Estimates by (Hyper)tree Agreement

Neural Information Processing Systems

We describe a method for computing provably exact maximum a posteriori (MAP)estimates for a subclass of problems on graphs with cycles. The basic idea is to represent the original problem on the graph with cycles asa convex combination of tree-structured problems. A convexity argument then guarantees that the optimal value of the original problem (i.e., the log probability of the MAP assignment) is upper bounded by the combined optimal values of the tree problems. We prove that this upper bound is met with equality if and only if the tree problems share an optimal configurationin common. An important implication is that any such shared configuration must also be the MAP configuration for the original problem. Next we develop a tree-reweighted max-product algorithm for attempting to find convex combinations of tree-structured problems that share a common optimum. We give necessary and sufficient conditions for a fixed point to yield the exact MAP estimate. An attractive feature of our analysis is that it generalizes naturally to convex combinations of hypertree-structured distributions.


Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement

Neural Information Processing Systems

The Bayesian paradigm provides a natural and effective means of exploiting priorknowledge concerning the time-frequency structure of sound signals such as speech and music--something which has often been overlooked intraditional audio signal processing approaches. Here, after constructing aBayesian model and prior distributions capable of taking into account the time-frequency characteristics of typical audio waveforms, we apply Markov chain Monte Carlo methods in order to sample from the resultant posterior distribution of interest. We present speech enhancement resultswhich compare favourably in objective terms with standard time-varying filtering techniques (and in several cases yield superior performance, bothobjectively and subjectively); moreover, in contrast to such methods, our results are obtained without an assumption of prior knowledge of the noise power.


On the Dirichlet Prior and Bayesian Regularization

Neural Information Processing Systems

In the Bayesian approach, regularizationis achieved by specifying a prior distribution over the parameters and subsequently averaging over the posterior distribution. This regularization provides not only smoother estimates of the parameters compared to maximum likelihood but also guides the selection of model structures. It was pointed out in [6] that a very large scale parameter of the Dirichlet prior can degrade predictive accuracy due to severe regularization of the parameter estimates. We complement this discussion here and show that a very small scale parameter can lead to poor over-regularized structures when a product of (conjugate) Dirichlet priors is used over multinomial conditional distributions (Section 3). Section 4 demonstrates the effect of the scale parameter and how it can be calibrated. We focus on the class of Bayesian network models throughout this paper.