Bayesian Inference
Representing Aggregators in Relational Probabilistic Models
Buchman, David (University of British Columbia) | Poole, David (University of British Columbia)
We consider the problem of, given a probabilistic model on a set of random variables, how to add a new variable that depends on the other variables, without changing the original distribution. In particular, we consider relational models (such as Markov logic networks (MLNs)), where we cannot directly define conditional probabilities. In relational models, there may be an unbounded number of parents in the grounding, and conditional distributions need to be defined in terms of aggregators. The question we ask is whether and when it is possible to represent conditional probabilities at all in various relational models. Some aggregators have been shown to be representable by MLNs, by adding auxiliary variables; however it was unknown whether they could be defined without auxiliary variables. For other aggregators, it was not known whether they can be represented by MLNs at all. We obtained surprisingly strong negative results on the capability of flexible undirected relational models such as MLNs to represent aggregators without affecting the original model's distribution. We provide a map of what aspects of the models, including the use of auxiliary variables and quantifiers, result in the ability to represent various aggregators. In addition, we provide proof techniques which can be used to facilitate future theoretic results on relational models, and demonstrate them on relational logistic regression (RLR).
Linear-Time Gibbs Sampling in Piecewise Graphical Models
Afshar, Hadi Mohasel (Australian National University, NICTA) | Sanner, Scott (NICTA, Australian National University) | Abbasnejad, Ehsan (Australian National University, NICTA)
Many real-world Bayesian inference problems such as preference learning or trader valuation modeling in financial markets naturally use piecewise likelihoods. Unfortunately, exact closed-form inference in the underlying Bayesian graphical models is intractable in the general case and existing approximation techniques provide few guarantees on both approximation quality and efficiency. While (Markov Chain) Monte Carlo methods provide an attractive asymptotically unbiased approximation approach, rejection sampling and Metropolis-Hastings both prove inefficient in practice, and analytical derivation of Gibbs samplers require exponential space and time in the amount of data. In this work, we show how to transform problematic piecewise likelihoods into equivalent mixture models and then provide a blocked Gibbs sampling approach for this transformed model that achieves an exponential-to-linear reduction in space and time compared to a conventional Gibbs sampler. This enables fast, asymptotically unbiased Bayesian inference in a new expressive class of piecewise graphical models and empirically requires orders of magnitude less time than rejection, Metropolis-Hastings, and conventional Gibbs sampling methods to achieve the same level of accuracy.
Leveraging Features and Networks for Probabilistic Tensor Decomposition
Rai, Piyush (Duke University) | Wang, Yingjian (PhD Student) | Carin, Lawrence (Professor)
We present a probabilistic model for tensor decomposition where one or more tensor modes may have side-information about the mode entities in form of their features and/or their adjacency network. We consider a Bayesian approach based on the Canonical PARAFAC (CP) decomposition and enrich this single-layer decomposition approach with a two-layer decomposition. The second layer fits a factor model for each layer-one factor matrix and models the factor matrix via the mode entities' features and/or the network between the mode entities. The second-layer decomposition of each factor matrix also learns a binary latent representation for the entities of that mode, which can be useful in its own right. Our model can handle both continuous as well as binary tensor observations. Another appealing aspect of our model is the simplicity of the model inference, with easy-to-sample Gibbs updates. We demonstrate the results of our model on several benchmarks datasets, consisting of both real and binary tensors.
The Hybrid Nested/Hierarchical Dirichlet Process and its Application to Topic Modeling with Word Differentiation
Ma, Tengfei (The University of Tokyo) | Sato, Issei (The University of Tokyo) | Nakagawa, Hiroshi (The University of Tokyo)
The hierarchical Dirichlet process (HDP) is a powerful nonparametric Bayesian approach to modeling groups of data which allows the mixture components in each group to be shared. However, in many cases the groups themselves are also in latent groups (categories) which may impact the modeling a lot. In order to utilize the unknown category information of grouped data, we present the hybrid nested/ hierarchical Dirichlet process (hNHDP), a prior that blends the desirable aspects of both the HDP and the nested Dirichlet Process (NDP). Specifically, we introduce a clustering structure for the groups. The prior distribution for each cluster is a realization of a Dirichlet process. Moreover, the set of cluster-specific distributions can share part of atoms between groups, and the shared atoms and specific atoms are generated separately. We apply the hNHDP to document modeling and bring in a mechanism to identify discriminative words and topics. We derive an efficient Markov chain Monte Carlo scheme for posterior inference and present experiments on document modeling.
Bayesian Maximum Margin Principal Component Analysis
Du, Changying (Chinese Academy of Sciences) | Zhe, Shandian (Purdue University) | Zhuang, Fuzhen (Chinese Academy of Sciences) | Qi, Yuan (Purdue University) | He, Qing (Chinese Academy of Sciences) | Shi, Zhongzhi (Chinese Academy of Sciences)
Supervised dimensionality reduction has shown great advantages in finding predictive subspaces. Previous methods rarely consider the popular maximum margin principle and are prone to overfitting to usually small training data, especially for those under the maximum likelihood framework. In this paper, we present a posterior-regularized Bayesian approach to combine Principal Component Analysis (PCA) with the max-margin learning. Based on the data augmentation idea for max-margin learning and the probabilistic interpretation of PCA, our method can automatically infer the weight and penalty parameter of max-margin learning machine, while finding the most appropriate PCA subspace simultaneously under the Bayesian framework. We develop a fast mean-field variational inference algorithm to approximate the posterior. Experimental results on various classification tasks show that our method outperforms a number of competitors.
Online Bayesian Models for Personal Analytics in Social Media
Volkova, Svitlana (Johns Hopkins University) | Durme, Benjamin Van ( Johns Hopkins University )
Latent author attribute prediction in social media provides a novel set of conditions for the construction of supervised classification models. With individual authors as training and test instances, their associated content ("features") are made available incrementally over time, as they converse over discussion forums. We propose various approaches to handling this dynamic data, from traditional batch training and testing, to incremental bootstrapping, and then active learning via crowdsourcing. Our underlying model relies on an intuitive application of Bayes rule, which should be easy to adopt by the community, thus allowing for a general shift towards online modeling for social media.
Weakly-Supervised Grammar-Informed Bayesian CCG Parser Learning
Garrette, Dan (University of Texas at Austin) | Dyer, Chris (Carnegie Mellon University) | Baldridge, Jason (University of Texas at Austin) | Smith, Noah A. (Carnegie Mellon University)
Combinatory Categorial Grammar (CCG) is a lexicalized grammar formalism in which words are associated with categories that, in combination with a small universal set of rules, specify the syntactic configurations in which they may occur. Categories are selected from a large, recursively-defined set; this leads to high word-to-category ambiguity, which is one of the primary factors that make learning CCG parsers difficult, especially in the face of little data. Previous work has shown that learning sequence models for CCG tagging can be improved by using linguistically-motivated prior probability distributions over potential categories. We extend this approach to the task of learning a CCG parser from weak supervision. We present a Bayesian formulation for CCG parser induction that assumes only supervision in the form of an incomplete tag dictionary mapping some word types to sets of potential categories. Our approach outperforms a baseline model trained with uniform priors by exploiting universal, intrinsic properties of the CCG formalism to bias the model toward simpler, more cross-linguistically common categories.
An Empirical Study on the Practical Impact of Prior Beliefs over Policy Types
Albrecht, Stefano Vittorino (The University of Edinburgh) | Crandall, Jacob William (Masdar Institute of Science and Technology) | Ramamoorthy, Subramanian (The University of Edinburgh)
Many multiagent applications require an agent to learn quickly how to interact with previously unknown other agents. To address this problem, researchers have studied learning algorithms which compute posterior beliefs over a hypothesised set of policies, based on the observed actions of the other agents. The posterior belief is complemented by the prior belief, which specifies the subjective likelihood of policies before any actions are observed. In this paper, we present the first comprehensive empirical study on the practical impact of prior beliefs over policies in repeated interactions. We show that prior beliefs can have a significant impact on the long-term performance of such methods, and that the magnitude of the impact depends on the depth of the planning horizon. Moreover, our results demonstrate that automatic methods can be used to compute prior beliefs with consistent performance effects. This indicates that prior beliefs could be eliminated as a manual parameter and instead be computed automatically.
Sparse Bayesian Multiview Learning for Simultaneous Association Discovery and Diagnosis of Alzheimer's Disease
Zhe, Shandian (Purdue University) | Xu, Zenglin (University of Electronic Science and Technology of China) | Qi, Yuan (Purdue University) | Yu, Peng (Eli lilly and Company)
In the analysis and diagnosis of many diseases, such as the Alzheimer's disease (AD), two important and related tasks are usually required: i) selecting genetic and phenotypical markers for diagnosis, and ii) identifying associations between genetic and phenotypical features. While previous studies treat these two tasks separately, they are tightly coupled due to the same underlying biological basis. To harness their potential benefits for each other, we propose a new sparse Bayesian approach to jointly carry out the two important and related tasks. In our approach, we extract common latent features from different data sources by sparse projection matrices and then use the latent features to predict disease severity levels; in return, the disease status can guide the learning of sparse projection matrices, which not only reveal interactions between data sources but also select groups of related biomarkers. In order to boost the learning of sparse projection matrices, we further incorporate graph Laplacian priors encoding the valuable linkage disequilibrium (LD) information. To efficiently estimate the model, we develop a variational inference algorithm. Analysis on an imaging genetics dataset for AD study shows that our model discovers biologically meaningful associations between single nucleotide polymorphisms (SNPs) and magnetic resonance imaging (MRI) features, and achieves significantly higher accuracy for predicting ordinal AD stages than competitive methods.
Bayesian Approach to Modeling and Detecting Communities in Signed Network
Yang, Bo (Jilin University) | Zhao, Xuehua (Jilin University) | Liu, Xueyan (Jilin University)
There has been an increasing interest in exploring signed networks with positive and negative links in that they contain more information than unsigned networks. As fundamental problems of signed network analysis, community detection and sign (or attitude) prediction are still primary challenges. To address them, we propose a generative Bayesian approach, in which 1) a signed stochastic blockmodel is proposed to characterize the community structure in context of signed networks, by means of explicitly formulating the distributions of both density and frustration of signed links from a stochastic perspective, and 2) a model learning algorithm is proposed by theoretically deriving a variational Bayes EM for parameter estimation and a variation based approximate evidence for model selection. Through the comparisons with state-of-the-art methods on synthetic and real-world networks, the proposed approach shows its superiority in both community detection and sign prediction for exploratory networks.