Bayesian Inference
Penalized Likelihood Methods for Estimation of Sparse High Dimensional Directed Acyclic Graphs
Shojaie, Ali, Michailidis, George
Directed acyclic graphs (DAGs) are commonly used to represent causal relationships among random variables in graphical models. Applications of these models arise in the study of physical, as well as biological systems, where directed edges between nodes represent the influence of components of the system on each other. The general problem of estimating DAGs from observed data is computationally NP-hard, Moreover two directed graphs may be observationally equivalent. When the nodes exhibit a natural ordering, the problem of estimating directed graphs reduces to the problem of estimating the structure of the network. In this paper, we propose a penalized likelihood approach that directly estimates the adjacency matrix of DAGs. Both lasso and adaptive lasso penalties are considered and an efficient algorithm is proposed for estimation of high dimensional DAGs. We study variable selection consistency of the two penalties when the number of variables grows to infinity with the sample size. We show that although lasso can only consistently estimate the true network under stringent assumptions, adaptive lasso achieves this task under mild regularity conditions. The performance of the proposed methods is compared to alternative methods in simulated, as well as real, data examples.
Likelihood-based semi-supervised model selection with applications to speech processing
White, Christopher M., Khudanpur, Sanjeev P., Wolfe, Patrick J.
In conventional supervised pattern recognition tasks, model selection is typically accomplished by minimizing the classification error rate on a set of so-called development data, subject to ground-truth labeling by human experts or some other means. In the context of speech processing systems and other large-scale practical applications, however, such labeled development data are typically costly and difficult to obtain. This article proposes an alternative semi-supervised framework for likelihood-based model selection that leverages unlabeled data by using trained classifiers representing each model to automatically generate putative labels. The errors that result from this automatic labeling are shown to be amenable to results from robust statistics, which in turn provide for minimax-optimal censored likelihood ratio tests that recover the nonparametric sign test as a limiting case. This approach is then validated experimentally using a state-of-the-art automatic speech recognition system to select between candidate word pronunciations using unlabeled speech data that only potentially contain instances of the words under test. Results provide supporting evidence for the utility of this approach, and suggest that it may also find use in other applications of machine learning.
Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches
Naseem, T., Snyder, B., Eisenstein, J., Barzilay, R.
We demonstrate the effectiveness of multilingual learning for unsupervised part-of-speech tagging. The central assumption of our work is that by combining cues from multiple languages, the structure of each becomes more apparent. We consider two ways of applying this intuition to the problem of unsupervised part-of-speech tagging: a model that directly merges tag structures for a pair of languages into a single sequence and a second model which instead incorporates multilingual context using latent variables. Both approaches are formulated as hierarchical Bayesian models, using Markov Chain Monte Carlo sampling techniques for inference. Our results demonstrate that by incorporating multilingual evidence we can achieve impressive performance gains across a range of scenarios. We also found that performance improves steadily as the number of available languages increases.
A Hierarchical Bayesian Model for Frame Representation
Chaรขri, L., Pesquet, J. -C., Tourneret, J. -Y., Ciuciu, Ph., Benazza-Benyahia, A.
In many signal processing problems, it may be fruitful to represent the signal under study in a frame. If a probabilistic approach is adopted, it becomes then necessary to estimate the hyper-parameters characterizing the probability distribution of the frame coefficients. This problem is difficult since in general the frame synthesis operator is not bijective. Consequently, the frame coefficients are not directly observable. This paper introduces a hierarchical Bayesian model for frame representation. The posterior distribution of the frame coefficients and model hyper-parameters is derived. Hybrid Markov Chain Monte Carlo algorithms are subsequently proposed to sample from this posterior distribution. The generated samples are then exploited to estimate the hyper-parameters and the frame coefficients of the target signal. Validation experiments show that the proposed algorithms provide an accurate estimation of the frame coefficients and hyper-parameters. Application to practical problems of image denoising show the impact of the resulting Bayesian estimation on the recovered signal quality.
Managing Conversation Uncertainty in TutorJ
Cannella, Vincenzo (University of Palermo) | Pirrone, Roberto (University of Palermo)
Uncertainty in natural language dialogue is often treated through stochastic models. Some of the authors already presented TutorJ that is an Intelligent Tutoring System, whose interaction with the user is very intensive, and makes use of both dialogic and graphical modality. When managing the interaction, the system needs to cope with uncertainty due to the understanding of the user's needs and wishes. In this paper we present the extended version of TutorJ, focusing on the new features added to its chatbot module. These features allow to merge deterministic and probabilistic reasoning in dialogue management, and in writing the rules of the system's procedural memory.
Causal Inference on Discrete Data using Additive Noise Models
Peters, Jonas, Janzing, Dominik, Schรถlkopf, Bernhard
Inferring causal relations by analyzing statistical dependences among observed random variables is a challenging task if no controlled randomized experiments are available. Socalled constraint-based approaches to causal discovery (Pearl, 2000; Spirtes et al., 1993) select among all directed acyclic graphs (DAGs) those that satisfy the Markov condition and the faithfulness assumption, i.e., those for which the observed independences are imposed by the structure rather than being a result of specific choices of parameters of the Bayesian network. These approaches are unable to distinguish among causal DAGs that impose the same independences. In particular, it is impossible to distinguish between X Y and Y X. More recently, several methods have been suggested that use not only conditional independences, but also more sophisticated properties of the joint distribution. For simplicity, we explain the ideas for the two variable setting since this case is particularly challenging. Kano & Shimizu (2003) use models Y f(X) N (1) where f is a linear function and N is additive noise that is independent of the hypothetical cause X. This is an example for an additive noise model from X to Y. Apart from trivial
Which graphical models are difficult to learn?
Bento, Jose, Montanari, Andrea
We consider the problem of learning the structure of Ising models (pairwise binary Markov random fields) from i.i.d. samples. While several methods have been proposed to accomplish this task, their relative merits and limitations remain somewhat obscure. By analyzing a number of concrete examples, we show that low-complexity algorithms systematically fail when the Markov random field develops long-range correlations. More precisely, this phenomenon appears to be related to the Ising model phase transition (although it does not coincide with it).
Content Modeling Using Latent Permutations
Chen, H., Branavan, S.R.K., Barzilay, R., Karger, D. R.
We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be effectively represented using a distribution over permutations called the Generalized Mallows Model. We apply our method to three complementary discourse-level tasks: cross-document alignment, document segmentation, and information ordering. Our experiments show that incorporating our permutation-based model in these applications yields substantial improvements in performance over previously proposed methods.
Learning Class-Level Bayes Nets for Relational Data
Schulte, Oliver, Khosravi, Hassan, Moser, Flavia, Ester, Martin
Many databases store data in relational format, with different types of entities and information about links between the entities. The field of statistical-relational learning (SRL) has developed a number of new statistical models for such data. In this paper we focus on learning class-level or first-order dependencies, which model the general database statistics over attributes of linked objects and links (e.g., the percentage of A grades given in computer science classes). Class-level statistical relationships are important in themselves, and they support applications like policy making, strategic planning, and query optimization. Most current SRL methods find class-level dependencies, but their main task is to support instance-level predictions about the attributes or links of specific entities. We focus only on class-level prediction, and describe algorithms for learning class-level models that are orders of magnitude faster for this task. Our algorithms learn Bayes nets with relational structure, leveraging the efficiency of single-table nonrelational Bayes net learners. An evaluation of our methods on three data sets shows that they are computationally feasible for realistic table sizes, and that the learned structures represent the statistical information in the databases well. After learning compiles the database statistics into a Bayes net, querying these statistics via Bayes net inference is faster than with SQL queries, and does not depend on the size of the database.
Efficient Bayesian analysis of multiple changepoint models with dependence across segments
We consider Bayesian analysis of a class of multiple changepoint models. While there are a variety of efficient ways to analyse these models if the parameters associated with each segment are independent, there are few general approaches for models where the parameters are dependent. Under the assumption that the dependence is Markov, we propose an efficient online algorithm for sampling from an approximation to the posterior distribution of the number and position of the changepoints. In a simulation study, we show that the approximation introduced is negligible. We illustrate the power of our approach through fitting piecewise polynomial models to data, under a model which allows for either continuity or discontinuity of the underlying curve at each changepoint. This method is competitive with, or out-performs, other methods for inferring curves from noisy data; and uniquely it allows for inference of the locations of discontinuities in the underlying curve.