Learning Graphical Models
Revisiting Algebra and Complexity of Inference in Graphical Models
Ravanbakhsh, Siamak, Greiner, Russell
This paper studies the form and complexity of inference in graphical models using the abstraction offered by algebraic structures. In particular, we broadly formalize inference problems in graphical models by viewing them as a sequence of operations based on commutative semigroups. We then study the computational complexity of inference by organizing various problems into an "inference hierarchy". When the underlying structure of an inference problem is a commutative semiring -- i.e. a combination of two commutative semigroups with the distributive law -- a message passing procedure called belief propagation can leverage this distributive law to perform polynomial-time inference for certain problems. After establishing the NP-hardness of inference in any commutative semiring, we investigate the relation between algebraic properties in this setting and further show that polynomial-time inference using distributive law does not (trivially) extend to inference problems that are expressed using more than two commutative semigroups. We then extend the algebraic treatment of message passing procedures to survey propagation, providing a novel perspective using a combination of two commutative semirings. This formulation generalizes the application of survey propagation to new settings.
Learning the Structure and Parameters of Large-Population Graphical Games from Behavioral Data
We consider learning, from strictly behavioral data, the structure and parameters of linear influence games (LIGs), a class of parametric graphical games introduced by Irfan and Ortiz (2014). LIGs facilitate causal strategic inference (CSI): Making inferences from causal interventions on stable behavior in strategic settings. Applications include the identification of the most influential individuals in large (social) networks. Such tasks can also support policy-making analysis. Motivated by the computational work on LIGs, we cast the learning problem as maximum-likelihood estimation (MLE) of a generative model defined by pure-strategy Nash equilibria (PSNE). Our simple formulation uncovers the fundamental interplay between goodness-of-fit and model complexity: good models capture equilibrium behavior within the data while controlling the true number of equilibria, including those unobserved. We provide a generalization bound establishing the sample complexity for MLE in our framework. We propose several algorithms including convex loss minimization (CLM) and sigmoidal approximations. We prove that the number of exact PSNE in LIGs is small, with high probability; thus, CLM is sound. We illustrate our approach on synthetic data and real-world U.S. congressional voting records. We briefly discuss our learning framework's generality and potential applicability to general graphical games.
Advanced Mean Field Theory of Restricted Boltzmann Machine
Huang, Haiping, Toyoizumi, Taro
Learning in restricted Boltzmann machine is typically hard due to the computation of gradients of log-likelihood function. To describe the network state statistics of the restricted Boltzmann machine, we develop an advanced mean field theory based on the Bethe approximation. Our theory provides an efficient message passing based method that evaluates not only the partition function (free energy) but also its gradients without requiring statistical sampling. The results are compared with those obtained by the computationally expensive sampling based method.
A Compositional Framework for Grounding Language Inference, Generation, and Acquisition in Video
Yu, Haonan, Siddharth, N., Barbu, Andrei, Siskind, Jeffrey Mark
We present an approach to simultaneously reasoning about a video clip and an entire natural-language sentence. The compositional nature of language is exploited to construct models which represent the meanings of entire sentences composed out of the meanings of the words in those sentences mediated by a grammar that encodes the predicate-argument relations. We demonstrate that these models faithfully represent the meanings of sentences and are sensitive to how the roles played by participants (nouns), their characteristics (adjectives), the actions performed (verbs), the manner of such actions (adverbs), and changing spatial relations between participants (prepositions) affect the meaning of a sentence and how it is grounded in video. We exploit this methodology in three ways. In the first, a video clip along with a sentence are taken as input and the participants in the event described by the sentence are highlighted, even when the clip depicts multiple similar simultaneous events. In the second, a video clip is taken as input without a sentence and a sentence is generated that describes an event in that clip. In the third, a corpus of video clips is paired with sentences which describe some of the events in those clips and the meanings of the words in those sentences are learned. We learn these meanings without needing to specify which attribute of the video clips each word in a given sentence refers to. The learned meaning representations are shown to be intelligible to humans.
On the Relationship between Sum-Product Networks and Bayesian Networks
Zhao, Han, Melibari, Mazen, Poupart, Pascal
In this paper, we establish some theoretical connections between Sum-Product Networks (SPNs) and Bayesian Networks (BNs). We prove that every SPN can be converted into a BN in linear time and space in terms of the network size. The key insight is to use Algebraic Decision Diagrams (ADDs) to compactly represent the local conditional probability distributions at each node in the resulting BN by exploiting context-specific independence (CSI). The generated BN has a simple directed bipartite graphical structure. We show that by applying the Variable Elimination algorithm (VE) to the generated BN with ADD representations, we can recover the original SPN where the SPN can be viewed as a history record or caching of the VE inference process. To help state the proof clearly, we introduce the notion of {\em normal} SPN and present a theoretical analysis of the consistency and decomposability properties. We conclude the paper with some discussion of the implications of the proof and establish a connection between the depth of an SPN and a lower bound of the tree-width of its corresponding BN.
Maximum a Posteriori Estimation by Search in Probabilistic Programs
We introduce an approximate search algorithm for fast maximum a posteriori probability estimation in probabilistic programs, which we call Bayesian ascent Monte Carlo (BaMC). Probabilistic programs represent probabilistic models with varying number of mutually dependent finite, countable, and continuous random variables. BaMC is an anytime MAP search algorithm applicable to any combination of random variables and dependencies. We compare BaMC to other MAP estimation algorithms and show that BaMC is faster and more robust on a range of probabilistic models.
Bayesian kernel-based system identification with quantized output data
Bottegal, Giulio, Pillonetto, Gianluigi, Hjalmarsson, Håkan
In this paper we introduce a novel method for linear system identification with quantized output data. We model the impulse response as a zero-mean Gaussian process whose covariance (kernel) is given by the recently proposed stable spline kernel, which encodes information on regularity and exponential stability. This serves as a starting point to cast our system identification problem into a Bayesian framework. We employ Markov Chain Monte Carlo (MCMC) methods to provide an estimate of the system. In particular, we show how to design a Gibbs sampler which quickly converges to the target distribution. Numerical simulations show a substantial improvement in the accuracy of the estimates over state-of-the-art kernel-based methods when employed in identification of systems with quantized data.1. INTRODUCTION Identification of systems from quantized data finds applications in a wide range of areas such as communications, networked control systems, bioinformatics (see e.g.
A Prior Distribution over Directed Acyclic Graphs for Sparse Bayesian Networks
Rios, Felix L., Noble, John M., Koski, Timo J. T.
The main contribution of this article is a new prior distribution over directed acyclic graphs, which gives larger weight to sparse graphs. This distribution is intended for structured Bayesian networks, where the structure is given by an ordered block model. That is, the nodes of the graph are objects which fall into categories (or blocks); the blocks have a natural ordering. The presence of a relationship between two objects is denoted by an arrow, from the object of lower category to the object of higher category. The models considered here were introduced in Kemp et al. (2004) for relational data and extended to multivariate data in Mansinghka et al. (2006). The prior over graph structures presented here has an explicit formula. The number of nodes in each layer of the graph follow a Hoppe Ewens urn model. We consider the situation where the nodes of the graph represent random variables, whose joint probability distribution factorises along the DAG. We describe Monte Carlo schemes for finding the optimal aposteriori structure given a data matrix and compare the performance with Mansinghka et al. (2006) and also with the uniform prior.
A Bayesian approach for structure learning in oscillating regulatory networks
Trejo, D, Millar, AJ, Sanguinetti, G
Oscillations lie at the core of many biological processes, from the cell cycle, to circadian oscillations and developmental processes. Time-keeping mechanisms are essential to enable organisms to adapt to varying conditions in environmental cycles, from day/night to seasonal. Transcriptional regulatory networks are one of the mechanisms behind these biological oscillations. However, while identifying cyclically expressed genes from time series measurements is relatively easy, determining the structure of the interaction network underpinning the oscillation is a far more challenging problem. Here, we explicitly leverage the oscillatory nature of the transcriptional signals and present a method for reconstructing network interactions tailored to this special but important class of genetic circuits. Our method is based on projecting the signal onto a set of oscillatory basis functions using a Discrete Fourier Transform. We build a Bayesian Hierarchical model within a frequency domain linear model in order to enforce sparsity and incorporate prior knowledge about the network structure. Experiments on real and simulated data show that the method can lead to substantial improvements over competing approaches if the oscillatory assumption is met, and remains competitive also in cases it is not.
Subjectivity, Bayesianism, and Causality
Bayesian probability theory is one of the most successful frameworks to model reasoning under uncertainty. Its defining property is the interpretation of probabilities as degrees of belief in propositions about the state of the world relative to an inquiring subject. This essay examines the notion of subjectivity by drawing parallels between Lacanian theory and Bayesian probability theory, and concludes that the latter must be enriched with causal interventions to model agency. The central contribution of this work is an abstract model of the subject that accommodates causal interventions in a measure-theoretic formalisation. This formalisation is obtained through a game-theoretic Ansatz based on modelling the inside and outside of the subject as an extensive-form game with imperfect information between two players. Finally, I illustrate the expressiveness of this model with an example of causal induction.