Uncertainty
Learning the Structure and Parameters of Large-Population Graphical Games from Behavioral Data
We consider learning, from strictly behavioral data, the structure and parameters of linear influence games (LIGs), a class of parametric graphical games introduced by Irfan and Ortiz (2014). LIGs facilitate causal strategic inference (CSI): Making inferences from causal interventions on stable behavior in strategic settings. Applications include the identification of the most influential individuals in large (social) networks. Such tasks can also support policy-making analysis. Motivated by the computational work on LIGs, we cast the learning problem as maximum-likelihood estimation (MLE) of a generative model defined by pure-strategy Nash equilibria (PSNE). Our simple formulation uncovers the fundamental interplay between goodness-of-fit and model complexity: good models capture equilibrium behavior within the data while controlling the true number of equilibria, including those unobserved. We provide a generalization bound establishing the sample complexity for MLE in our framework. We propose several algorithms including convex loss minimization (CLM) and sigmoidal approximations. We prove that the number of exact PSNE in LIGs is small, with high probability; thus, CLM is sound. We illustrate our approach on synthetic data and real-world U.S. congressional voting records. We briefly discuss our learning framework's generality and potential applicability to general graphical games.
Structured Block Basis Factorization for Scalable Kernel Matrix Evaluation
Wang, Ruoxi, Li, Yingzhou, Mahoney, Michael W., Darve, Eric
Kernel matrices are popular in machine learning and scientific computing, but they are limited by their quadratic complexity in both construction and storage. It is well-known that as one varies the kernel parameter, e.g., the width parameter in radial basis function kernels, the kernel matrix changes from a smooth low-rank kernel to a diagonally-dominant and then fully-diagonal kernel. Low-rank approximation methods have been widely-studied, mostly in the first case, to reduce the memory storage and the cost of computing matrix-vector products. Here, we use ideas from scientific computing to propose an extension of these methods to situations where the matrix is not well-approximated by a low-rank matrix. In particular, we construct an efficient block low-rank approximation method---which we call the Block Basis Factorization---and we show that it has $\mathcal{O}(n)$ complexity in both time and memory. Our method works for a wide range of kernel parameters, extending the domain of applicability of low-rank approximation methods, and our empirical results demonstrate the stability (small standard deviation in error) and superiority over current state-of-art kernel approximation algorithms.
A Compositional Framework for Grounding Language Inference, Generation, and Acquisition in Video
Yu, Haonan, Siddharth, N., Barbu, Andrei, Siskind, Jeffrey Mark
We present an approach to simultaneously reasoning about a video clip and an entire natural-language sentence. The compositional nature of language is exploited to construct models which represent the meanings of entire sentences composed out of the meanings of the words in those sentences mediated by a grammar that encodes the predicate-argument relations. We demonstrate that these models faithfully represent the meanings of sentences and are sensitive to how the roles played by participants (nouns), their characteristics (adjectives), the actions performed (verbs), the manner of such actions (adverbs), and changing spatial relations between participants (prepositions) affect the meaning of a sentence and how it is grounded in video. We exploit this methodology in three ways. In the first, a video clip along with a sentence are taken as input and the participants in the event described by the sentence are highlighted, even when the clip depicts multiple similar simultaneous events. In the second, a video clip is taken as input without a sentence and a sentence is generated that describes an event in that clip. In the third, a corpus of video clips is paired with sentences which describe some of the events in those clips and the meanings of the words in those sentences are learned. We learn these meanings without needing to specify which attribute of the video clips each word in a given sentence refers to. The learned meaning representations are shown to be intelligible to humans.
On the Relationship between Sum-Product Networks and Bayesian Networks
Zhao, Han, Melibari, Mazen, Poupart, Pascal
In this paper, we establish some theoretical connections between Sum-Product Networks (SPNs) and Bayesian Networks (BNs). We prove that every SPN can be converted into a BN in linear time and space in terms of the network size. The key insight is to use Algebraic Decision Diagrams (ADDs) to compactly represent the local conditional probability distributions at each node in the resulting BN by exploiting context-specific independence (CSI). The generated BN has a simple directed bipartite graphical structure. We show that by applying the Variable Elimination algorithm (VE) to the generated BN with ADD representations, we can recover the original SPN where the SPN can be viewed as a history record or caching of the VE inference process. To help state the proof clearly, we introduce the notion of {\em normal} SPN and present a theoretical analysis of the consistency and decomposability properties. We conclude the paper with some discussion of the implications of the proof and establish a connection between the depth of an SPN and a lower bound of the tree-width of its corresponding BN.
Maximum a Posteriori Estimation by Search in Probabilistic Programs
We introduce an approximate search algorithm for fast maximum a posteriori probability estimation in probabilistic programs, which we call Bayesian ascent Monte Carlo (BaMC). Probabilistic programs represent probabilistic models with varying number of mutually dependent finite, countable, and continuous random variables. BaMC is an anytime MAP search algorithm applicable to any combination of random variables and dependencies. We compare BaMC to other MAP estimation algorithms and show that BaMC is faster and more robust on a range of probabilistic models.
Bayesian kernel-based system identification with quantized output data
Bottegal, Giulio, Pillonetto, Gianluigi, Hjalmarsson, Hรฅkan
In this paper we introduce a novel method for linear system identification with quantized output data. We model the impulse response as a zero-mean Gaussian process whose covariance (kernel) is given by the recently proposed stable spline kernel, which encodes information on regularity and exponential stability. This serves as a starting point to cast our system identification problem into a Bayesian framework. We employ Markov Chain Monte Carlo (MCMC) methods to provide an estimate of the system. In particular, we show how to design a Gibbs sampler which quickly converges to the target distribution. Numerical simulations show a substantial improvement in the accuracy of the estimates over state-of-the-art kernel-based methods when employed in identification of systems with quantized data.1. INTRODUCTION Identification of systems from quantized data finds applications in a wide range of areas such as communications, networked control systems, bioinformatics (see e.g.
A Prior Distribution over Directed Acyclic Graphs for Sparse Bayesian Networks
Rios, Felix L., Noble, John M., Koski, Timo J. T.
The main contribution of this article is a new prior distribution over directed acyclic graphs, which gives larger weight to sparse graphs. This distribution is intended for structured Bayesian networks, where the structure is given by an ordered block model. That is, the nodes of the graph are objects which fall into categories (or blocks); the blocks have a natural ordering. The presence of a relationship between two objects is denoted by an arrow, from the object of lower category to the object of higher category. The models considered here were introduced in Kemp et al. (2004) for relational data and extended to multivariate data in Mansinghka et al. (2006). The prior over graph structures presented here has an explicit formula. The number of nodes in each layer of the graph follow a Hoppe Ewens urn model. We consider the situation where the nodes of the graph represent random variables, whose joint probability distribution factorises along the DAG. We describe Monte Carlo schemes for finding the optimal aposteriori structure given a data matrix and compare the performance with Mansinghka et al. (2006) and also with the uniform prior.
A Bayesian approach for structure learning in oscillating regulatory networks
Trejo, D, Millar, AJ, Sanguinetti, G
Oscillations lie at the core of many biological processes, from the cell cycle, to circadian oscillations and developmental processes. Time-keeping mechanisms are essential to enable organisms to adapt to varying conditions in environmental cycles, from day/night to seasonal. Transcriptional regulatory networks are one of the mechanisms behind these biological oscillations. However, while identifying cyclically expressed genes from time series measurements is relatively easy, determining the structure of the interaction network underpinning the oscillation is a far more challenging problem. Here, we explicitly leverage the oscillatory nature of the transcriptional signals and present a method for reconstructing network interactions tailored to this special but important class of genetic circuits. Our method is based on projecting the signal onto a set of oscillatory basis functions using a Discrete Fourier Transform. We build a Bayesian Hierarchical model within a frequency domain linear model in order to enforce sparsity and incorporate prior knowledge about the network structure. Experiments on real and simulated data show that the method can lead to substantial improvements over competing approaches if the oscillatory assumption is met, and remains competitive also in cases it is not.
Subjectivity, Bayesianism, and Causality
Bayesian probability theory is one of the most successful frameworks to model reasoning under uncertainty. Its defining property is the interpretation of probabilities as degrees of belief in propositions about the state of the world relative to an inquiring subject. This essay examines the notion of subjectivity by drawing parallels between Lacanian theory and Bayesian probability theory, and concludes that the latter must be enriched with causal interventions to model agency. The central contribution of this work is an abstract model of the subject that accommodates causal interventions in a measure-theoretic formalisation. This formalisation is obtained through a game-theoretic Ansatz based on modelling the inside and outside of the subject as an extensive-form game with imperfect information between two players. Finally, I illustrate the expressiveness of this model with an example of causal induction.
Graphical Fermat's Principle and Triangle-Free Graph Estimation
We consider the problem of estimating undirected triangle-free graphs of high dimensional distributions. Triangle-free graphs form a rich graph family which allows arbitrary loopy structures but 3-cliques. For inferential tractability, we propose a graphical Fermat's principle to regularize the distribution family. Such principle enforces the existence of a distribution-dependent pseudo-metric such that any two nodes have a smaller distance than that of two other nodes who have a geodesic path include these two nodes. Guided by this principle, we show that a greedy strategy is able to recover the true graph. The resulting algorithm only requires a pairwise distance matrix as input and is computationally even more efficient than calculating the minimum spanning tree. We consider graph estimation problems under different settings, including discrete and nonparametric distribution families. Thorough numerical results are provided to illustrate the usefulness of the proposed method.