Uncertainty
Compress and Control
Veness, Joel (Google DeepMind) | Bellemare, Marc G (Google DeepMind) | Hutter, Marcus (Australian National University) | Chua, Alvin (Google DeepMind) | Desjardins, Guillaume (Google DeepMind)
This paper describes a new information-theoretic policy evaluation technique for reinforcement learning. This technique converts any compression or density model into a corresponding estimate of value. Under appropriate stationarity and ergodicity conditions, we show that the use of a sufficiently powerful model gives rise to a consistent value function estimator. We also study the behavior of this technique when applied to various Atari 2600 video games, where the use of suboptimal modeling techniques is unavoidable. We consider three fundamentally different models, all too limited to perfectly model the dynamics of the system. Remarkably, we find that our technique provides sufficiently accurate value estimates for effective on-policy control. We conclude with a suggestive study highlighting the potential of our technique to scale to large problems.
Leveraging Features and Networks for Probabilistic Tensor Decomposition
Rai, Piyush (Duke University) | Wang, Yingjian (PhD Student) | Carin, Lawrence (Professor)
We present a probabilistic model for tensor decomposition where one or more tensor modes may have side-information about the mode entities in form of their features and/or their adjacency network. We consider a Bayesian approach based on the Canonical PARAFAC (CP) decomposition and enrich this single-layer decomposition approach with a two-layer decomposition. The second layer fits a factor model for each layer-one factor matrix and models the factor matrix via the mode entities' features and/or the network between the mode entities. The second-layer decomposition of each factor matrix also learns a binary latent representation for the entities of that mode, which can be useful in its own right. Our model can handle both continuous as well as binary tensor observations. Another appealing aspect of our model is the simplicity of the model inference, with easy-to-sample Gibbs updates. We demonstrate the results of our model on several benchmarks datasets, consisting of both real and binary tensors.
Detecting Change Points in the Large-Scale Structure of Evolving Networks
Peel, Leto (University of Colorado at Boulder) | Clauset, Aaron (University of Colorado at Boulder)
Interactions among people or objects are often dynamic in nature and can be represented as a sequence of networks, each providing a snapshot of the interactions over a brief period of time. An important task in analyzing such evolving networks is change-point detection, in which we both identify the times at which the large-scale pattern of interactions changes fundamentally and quantify how large and what kind of change occurred. Here, we formalize for the first time the network change-point detection problem within an online probabilistic learning framework and introduce a method that can reliably solve it. This method combines a generalized hierarchical random graph model with a Bayesian hypothesis test to quantitatively determine if, when, and precisely how a change point has occurred. We analyze the detectability of our method using synthetic data with known change points of different types and magnitudes, and show that this method is more accurate than several previously used alternatives. Applied to two high-resolution evolving social networks, this method identifies a sequence of change points that align with known external ``shocks'' to these networks.
The Hybrid Nested/Hierarchical Dirichlet Process and its Application to Topic Modeling with Word Differentiation
Ma, Tengfei (The University of Tokyo) | Sato, Issei (The University of Tokyo) | Nakagawa, Hiroshi (The University of Tokyo)
The hierarchical Dirichlet process (HDP) is a powerful nonparametric Bayesian approach to modeling groups of data which allows the mixture components in each group to be shared. However, in many cases the groups themselves are also in latent groups (categories) which may impact the modeling a lot. In order to utilize the unknown category information of grouped data, we present the hybrid nested/ hierarchical Dirichlet process (hNHDP), a prior that blends the desirable aspects of both the HDP and the nested Dirichlet Process (NDP). Specifically, we introduce a clustering structure for the groups. The prior distribution for each cluster is a realization of a Dirichlet process. Moreover, the set of cluster-specific distributions can share part of atoms between groups, and the shared atoms and specific atoms are generated separately. We apply the hNHDP to document modeling and bring in a mechanism to identify discriminative words and topics. We derive an efficient Markov chain Monte Carlo scheme for posterior inference and present experiments on document modeling.
Bayesian Maximum Margin Principal Component Analysis
Du, Changying (Chinese Academy of Sciences) | Zhe, Shandian (Purdue University) | Zhuang, Fuzhen (Chinese Academy of Sciences) | Qi, Yuan (Purdue University) | He, Qing (Chinese Academy of Sciences) | Shi, Zhongzhi (Chinese Academy of Sciences)
Supervised dimensionality reduction has shown great advantages in finding predictive subspaces. Previous methods rarely consider the popular maximum margin principle and are prone to overfitting to usually small training data, especially for those under the maximum likelihood framework. In this paper, we present a posterior-regularized Bayesian approach to combine Principal Component Analysis (PCA) with the max-margin learning. Based on the data augmentation idea for max-margin learning and the probabilistic interpretation of PCA, our method can automatically infer the weight and penalty parameter of max-margin learning machine, while finding the most appropriate PCA subspace simultaneously under the Bayesian framework. We develop a fast mean-field variational inference algorithm to approximate the posterior. Experimental results on various classification tasks show that our method outperforms a number of competitors.
Learning Relational Kalman Filtering
Choi, Jaesik (Ulsan National Institute of Science and Technology) | Amir, Eyal (University of Illinois at Urbana-Champaign) | Xu, Tianfang (University of Illinois at Urbana-Champaign) | Valocchi, Albert J. (University of Illinois at Urbana-Champaign)
The Kalman Filter (KF) is pervasively used to control a vast array of consumer, health and defense products. By grouping sets of symmetric state variables, the Relational Kalman Filter (RKF) enables us to scale the exact KF for large-scale dynamic systems. In this paper, we provide a parameter learning algorithm for RKF, and a regrouping algorithm that prevents the degeneration of the relational structure for efficient filtering. The proposed algorithms significantly expand the applicability of the RKFs by solving the following questions: (1) how to learn parameters for RKF from partial observations; and (2) how to regroup the degenerated state variables by noisy real-world observations. To our knowledge, this is the first paper on learning parameters in relational continuous probabilistic models. We show that our new algorithms significantly improve the accuracy and the efficiency of filtering large-scale dynamic systems.
Chinese Common Noun Phrase Resolution: An Unsupervised Probabilistic Model Rivaling Supervised Resolvers
Chen, Chen (University of Texas at Dallas) | Ng, Vincent (University of Texas at Dallas)
Pronoun resolution and common noun phrase resolution are the two most challenging subtasks of coreference resolution. While a lot of work has focused on pronoun resolution, common noun phrase resolution has almost always been tackled in the context of the larger coreference resolution task. In fact, to our knowledge, there has been no attempt to address Chinese common noun phrase resolution as a standalone task. In this paper, we propose a generative model for unsupervised Chinese common noun phrase resolution that not only allows easy incorporation of linguistic constraints on coreference but also performs joint resolution and anaphoricity determination. When evaluated on the Chinese portion of the OntoNotes 5.0 corpus, our model rivals its supervised counterpart in performance.
Online Bayesian Models for Personal Analytics in Social Media
Volkova, Svitlana (Johns Hopkins University) | Durme, Benjamin Van ( Johns Hopkins University )
Latent author attribute prediction in social media provides a novel set of conditions for the construction of supervised classification models. With individual authors as training and test instances, their associated content ("features") are made available incrementally over time, as they converse over discussion forums. We propose various approaches to handling this dynamic data, from traditional batch training and testing, to incremental bootstrapping, and then active learning via crowdsourcing. Our underlying model relies on an intuitive application of Bayes rule, which should be easy to adopt by the community, thus allowing for a general shift towards online modeling for social media.
Weakly-Supervised Grammar-Informed Bayesian CCG Parser Learning
Garrette, Dan (University of Texas at Austin) | Dyer, Chris (Carnegie Mellon University) | Baldridge, Jason (University of Texas at Austin) | Smith, Noah A. (Carnegie Mellon University)
Combinatory Categorial Grammar (CCG) is a lexicalized grammar formalism in which words are associated with categories that, in combination with a small universal set of rules, specify the syntactic configurations in which they may occur. Categories are selected from a large, recursively-defined set; this leads to high word-to-category ambiguity, which is one of the primary factors that make learning CCG parsers difficult, especially in the face of little data. Previous work has shown that learning sequence models for CCG tagging can be improved by using linguistically-motivated prior probability distributions over potential categories. We extend this approach to the task of learning a CCG parser from weak supervision. We present a Bayesian formulation for CCG parser induction that assumes only supervision in the form of an incomplete tag dictionary mapping some word types to sets of potential categories. Our approach outperforms a baseline model trained with uniform priors by exploiting universal, intrinsic properties of the CCG formalism to bias the model toward simpler, more cross-linguistically common categories.
Plurality Voting Under Uncertainty
Meir, Reshef (Harvard University)
Understanding the nature of strategic voting is the holy grail of social choice theory, where game-theory, social science and recently computational approaches are all applied in order to model the incentives and behavior of voters. In a recent paper, Meir et al.[EC'14] made another step in this direction, by suggesting a behavioral game-theoretic model for voters under uncertainty. For a specific variation of best-response heuristics, they proved initial existence and convergence results in the Plurality voting system. This paper extends the model in multiple directions, considering voters with different uncertainty levels, simultaneous strategic decisions, and a more permissive notion of best-response. It is proved that a voting equilibrium exists even in the most general case. Further, any society voting in an iterative setting is guaranteed to converge to an equilibrium. An alternative behavior is analyzed, where voters try to minimize their worst-case regret. As it turns out, the two behaviors coincide in the simple setting of Meir et al.[EC'14], but not in the general case.