Learning Graphical Models
Understanding the Predictive Power of Computational Mechanics and Echo State Networks in Social Media
Darmon, David, Sylvester, Jared, Girvan, Michelle, Rand, William
ABSTRACT There is a large amount of interest in understanding users of social media in order to predict their behavior in this space. Despite this interest, user predictability in social media is not well-understood. To examine this question, we consider a network of fifteen thousand users on Twitter over a seven week period. We apply two contrasting modeling paradigms: computational mechanics and echo state networks. Both methods attempt to model the behavior of users on the basis of their past behavior. We demonstrate that the behavior of users on Twitter can be well-modeled as processes with self-feedback. We find that the two modeling approaches perform very similarly for most users, but that they differ in performance on a small subset of the users. By exploring the properties of these performance-differentiated users, we highlight the challenges faced in applying predictive models to dynamic social data. I INTRODUCTION At the most abstract level, an individual using a social media service may be viewed as a computational agent [1].
Likelihood Adaptively Modified Penalties
Feng, Yang, Li, Tengfei, Ying, Zhiliang
A new family of penalty functions, adaptive to likelihood, is introduced for model selection in general regression models. It arises naturally through assuming certain types of prior distribution on the regression parameters. To study stability properties of the penalized maximum likelihood estimator, two types of asymptotic stability are defined. Theoretical properties, including the parameter estimation consistency, model selection consistency, and asymptotic stability, are established under suitable regularity conditions. An efficient coordinate-descent algorithm is proposed. Simulation results and real data analysis show that the proposed method has competitive performance in comparison with existing ones.
Top-down particle filtering for Bayesian decision trees
Lakshminarayanan, Balaji, Roy, Daniel M., Teh, Yee Whye
Decision tree learning is a popular approach for classification and regression in machine learning and statistics, and Bayesian formulations---which introduce a prior distribution over decision trees, and formulate learning as posterior inference given data---have been shown to produce competitive performance. Unlike classic decision tree learning algorithms like ID3, C4.5 and CART, which work in a top-down manner, existing Bayesian algorithms produce an approximation to the posterior distribution by evolving a complete tree (or collection thereof) iteratively via local Monte Carlo modifications to the structure of the tree, e.g., using Markov chain Monte Carlo (MCMC). We present a sequential Monte Carlo (SMC) algorithm that instead works in a top-down manner, mimicking the behavior and speed of classic algorithms. We demonstrate empirically that our approach delivers accuracy comparable to the most popular MCMC method, but operates more than an order of magnitude faster, and thus represents a better computation-accuracy tradeoff.
POMDPs under Probabilistic Semantics
Chatterjee, Krishnendu, Chmelík, Martin
We consider partially observable Markov decision processes (POMDPs) with limit-average payoff, where a reward value in the interval [0,1] is associated to every transition, and the payoff of an infinite path is the long-run average of the rewards. We consider two types of path constraints: (i) quantitative constraint defines the set of paths where the payoff is at least a given threshold {\lambda} in (0, 1]; and (ii) qualitative constraint which is a special case of quantitative constraint with {\lambda} = 1. We consider the computation of the almost-sure winning set, where the controller needs to ensure that the path constraint is satisfied with probability 1. Our main results for qualitative path constraint are as follows: (i) the problem of deciding the existence of a finite-memory controller is EXPTIME-complete; and (ii) the problem of deciding the existence of an infinite-memory controller is undecidable. For quantitative path constraint we show that the problem of deciding the existence of a finite-memory controller is undecidable.
When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity
Anandkumar, Animashree, Hsu, Daniel, Janzamin, Majid, Kakade, Sham
Overcomplete latent representations have been very popular for unsupervised feature learning in recent years. In this paper, we specify which overcomplete models can be identified given observable moments of a certain order. We consider probabilistic admixture or topic models in the overcomplete regime, where the number of latent topics can greatly exceed the size of the observed word vocabulary. While general overcomplete topic models are not identifiable, we establish generic identifiability under a constraint, referred to as topic persistence. Our sufficient conditions for identifiability involve a novel set of "higher order" expansion conditions on the topic-word matrix or the population structure of the model. This set of higher-order expansion conditions allow for overcomplete models, and require the existence of a perfect matching from latent topics to higher order observed words. We establish that random structured topic models are identifiable w.h.p. in the overcomplete regime. Our identifiability results allows for general (non-degenerate) distributions for modeling the topic proportions, and thus, we can handle arbitrarily correlated topics in our framework. Our identifiability results imply uniqueness of a class of tensor decompositions with structured sparsity which is contained in the class of Tucker decompositions, but is more general than the Candecomp/Parafac (CP) decomposition.
Supplement to "Reversible MCMC on Markov equivalence classes of sparse directed acyclic graphs"
He, Yangbo, Jia, Jinzhu, Yu, Bin
This supplementary material includes three parts: some preliminary results, four examples, an experiment, three new algorithms, and all proofs of the results in the paper [4]. In this Section, we provide algorithms introduced by Dor and Tarsi [3], and Chickering [1, 2] respectively. These results are necessary to implement our proposed approach technically. Some definitions and notation are introduced first. A directed edge of a DAG is compelled if it occurs in the corresponding completed PDAG, otherwise, the directed edge is reversible and the corresponding parents are reversible parents.
Cyclic Causal Models with Discrete Variables: Markov Chain Equilibrium Semantics and Sample Ordering
Poole, David (University of British Columbia,) | Crowley, Mark (Oregon State University)
We analyze the foundations of cyclic causal models for discrete variables, and compare structural equation models (SEMs) to an alternative semantics as the equilibrium (stationary) distribution of a Markov chain. We show under general conditions, discrete cyclic SEMs cannot have independent noise; even in the simplest case, cyclic structural equation models imply constraints on the noise. We give a formalization of an alternative Markov chain equilibrium semantics which requires not only the causal graph, but also a sample order. We show how the resulting equilibrium is a function of the sample ordering, both theoretically and empirically.
Interactive Value Iteration for Markov Decision Processes with Unknown Rewards
Weng, Paul (LIP6, UPMC) | Zanuttini, Bruno (University of Caen)
To tackle the potentially hard task of defining the reward function in a Markov Decision Process, we propose a new approach, based on Value Iteration, which interweaves the elicitation and optimization phases. We assume that rewards whose numeric values are unknown can only be ordered, and that a tutor is present to help comparing sequences of re- wards. We first show how the set of possible reward functions for a given preference relation can be rep- resented as a polytope. Then our algorithm, called Interactive Value Iteration, searches for an optimal policy while refining its knowledge about the pos- sible reward functions, by querying a tutor when necessary. We prove that the number of queries needed before finding an optimal policy is upper- bounded by a polynomial in the size of the problem, and we present experimental results which demon- strate that our approach is efficient in practice.
Incorporating Expert Judgement into Bayesian Network Machine Learning
Zhou, Yun (Queen Mary University of London) | Fenton, Norman (Queen Mary University of London) | Neil, Martin (Queen Mary University of London) | Zhu, Cheng (National University of Defense Technology)
We review the challenges of Bayesian network learning, especially parameter learning, and specify the problem of learning with sparse data. We explain how it is possible to incorporate both qualitative knowledge and data with a multinomial parameter learning method to achieve more accurate predictions with sparse data.
Object Recognition Based on Visual Grammars and Bayesian Networks
Ruiz, Elias (National Institute of Astrophysics, Optics and Electronics) | Sucar, Luis Enrique (National Institute of Astrophysics, Optics and Electronics)
A novel proposal for object recognition based on relational grammars and Bayesian Networks is presented. Based on this grammar an object is represented as a hierarchy of features and spatial relations. This representation is transformed to a Bayesian network structure which parameters are learned from examples. Thus, recognition is based on probabilistic inference in the Bayesian network representation. Preliminary results in modeling natural objects are presented.