Goto

Collaborating Authors

 Directed Networks


On the Relationship between Sum-Product Networks and Bayesian Networks

arXiv.org Artificial Intelligence

In this paper, we establish some theoretical connections between Sum-Product Networks (SPNs) and Bayesian Networks (BNs). We prove that every SPN can be converted into a BN in linear time and space in terms of the network size. The key insight is to use Algebraic Decision Diagrams (ADDs) to compactly represent the local conditional probability distributions at each node in the resulting BN by exploiting context-specific independence (CSI). The generated BN has a simple directed bipartite graphical structure. We show that by applying the Variable Elimination algorithm (VE) to the generated BN with ADD representations, we can recover the original SPN where the SPN can be viewed as a history record or caching of the VE inference process. To help state the proof clearly, we introduce the notion of {\em normal} SPN and present a theoretical analysis of the consistency and decomposability properties. We conclude the paper with some discussion of the implications of the proof and establish a connection between the depth of an SPN and a lower bound of the tree-width of its corresponding BN.


Bayesian kernel-based system identification with quantized output data

arXiv.org Machine Learning

In this paper we introduce a novel method for linear system identification with quantized output data. We model the impulse response as a zero-mean Gaussian process whose covariance (kernel) is given by the recently proposed stable spline kernel, which encodes information on regularity and exponential stability. This serves as a starting point to cast our system identification problem into a Bayesian framework. We employ Markov Chain Monte Carlo (MCMC) methods to provide an estimate of the system. In particular, we show how to design a Gibbs sampler which quickly converges to the target distribution. Numerical simulations show a substantial improvement in the accuracy of the estimates over state-of-the-art kernel-based methods when employed in identification of systems with quantized data.1. INTRODUCTION Identification of systems from quantized data finds applications in a wide range of areas such as communications, networked control systems, bioinformatics (see e.g.


A Prior Distribution over Directed Acyclic Graphs for Sparse Bayesian Networks

arXiv.org Machine Learning

The main contribution of this article is a new prior distribution over directed acyclic graphs, which gives larger weight to sparse graphs. This distribution is intended for structured Bayesian networks, where the structure is given by an ordered block model. That is, the nodes of the graph are objects which fall into categories (or blocks); the blocks have a natural ordering. The presence of a relationship between two objects is denoted by an arrow, from the object of lower category to the object of higher category. The models considered here were introduced in Kemp et al. (2004) for relational data and extended to multivariate data in Mansinghka et al. (2006). The prior over graph structures presented here has an explicit formula. The number of nodes in each layer of the graph follow a Hoppe Ewens urn model. We consider the situation where the nodes of the graph represent random variables, whose joint probability distribution factorises along the DAG. We describe Monte Carlo schemes for finding the optimal aposteriori structure given a data matrix and compare the performance with Mansinghka et al. (2006) and also with the uniform prior.


A Bayesian approach for structure learning in oscillating regulatory networks

arXiv.org Machine Learning

Oscillations lie at the core of many biological processes, from the cell cycle, to circadian oscillations and developmental processes. Time-keeping mechanisms are essential to enable organisms to adapt to varying conditions in environmental cycles, from day/night to seasonal. Transcriptional regulatory networks are one of the mechanisms behind these biological oscillations. However, while identifying cyclically expressed genes from time series measurements is relatively easy, determining the structure of the interaction network underpinning the oscillation is a far more challenging problem. Here, we explicitly leverage the oscillatory nature of the transcriptional signals and present a method for reconstructing network interactions tailored to this special but important class of genetic circuits. Our method is based on projecting the signal onto a set of oscillatory basis functions using a Discrete Fourier Transform. We build a Bayesian Hierarchical model within a frequency domain linear model in order to enforce sparsity and incorporate prior knowledge about the network structure. Experiments on real and simulated data show that the method can lead to substantial improvements over competing approaches if the oscillatory assumption is met, and remains competitive also in cases it is not.


Subjectivity, Bayesianism, and Causality

arXiv.org Machine Learning

Bayesian probability theory is one of the most successful frameworks to model reasoning under uncertainty. Its defining property is the interpretation of probabilities as degrees of belief in propositions about the state of the world relative to an inquiring subject. This essay examines the notion of subjectivity by drawing parallels between Lacanian theory and Bayesian probability theory, and concludes that the latter must be enriched with causal interventions to model agency. The central contribution of this work is an abstract model of the subject that accommodates causal interventions in a measure-theoretic formalisation. This formalisation is obtained through a game-theoretic Ansatz based on modelling the inside and outside of the subject as an extensive-form game with imperfect information between two players. Finally, I illustrate the expressiveness of this model with an example of causal induction.


A local approach to estimation in discrete loglinear models

arXiv.org Machine Learning

We consider two connected aspects of maximum likelihood estimation of the parameter for high-dimensional discrete graphical models: the existence of the maximum likelihood estimate (mle) and its computation. When the data is sparse, there are many zeros in the contingency table and the maximum likelihood estimate of the parameter may not exist. Fienberg and Rinaldo (2012) have shown that the mle does not exists iff the data vector belongs to a face of the so-called marginal cone spanned by the rows of the design matrix of the model. Identifying these faces in high-dimension is challenging. In this paper, we take a local approach : we show that one such face, albeit possibly not the smallest one, can be identified by looking at a collection of marginal graphical models generated by induced subgraphs $G_i,i=1,\ldots,k$ of $G$. This is our first contribution. Our second contribution concerns the composite maximum likelihood estimate. When the dimension of the problem is large, estimating the parameters of a given graphical model through maximum likelihood is challenging, if not impossible. The traditional approach to this problem has been local with the use of composite likelihood based on local conditional likelihoods. A more recent development is to have the components of the composite likelihood be marginal likelihoods centred around each $v$. We first show that the estimates obtained by consensus through local conditional and marginal likelihoods are identical. We then study the asymptotic properties of the composite maximum likelihood estimate when both the dimension of the model and the sample size $N$ go to infinity.


Asymptotic Accuracy of Bayesian Estimation for a Single Latent Variable

arXiv.org Machine Learning

In data science and machine learning, hierarchical parametric models, such as mixture models, are often used. They contain two kinds of variables: observable variables, which represent the parts of the data that can be directly measured, and latent variables, which represent the underlying processes that generate the data. Although there has been an increase in research on the estimation accuracy for observable variables, the theoretical analysis of estimating latent variables has not been thoroughly investigated. In a previous study, we determined the accuracy of a Bayes estimation for the joint probability of the latent variables in a dataset, and we proved that the Bayes method is asymptotically more accurate than the maximum-likelihood method. However, the accuracy of the Bayes estimation for a single latent variable remains unknown. In the present paper, we derive the asymptotic expansions of the error functions, which are defined by the Kullback-Leibler divergence, for two types of single-variable estimations when the statistical regularity is satisfied. Our results indicate that the accuracies of the Bayes and maximum-likelihood methods are asymptotically equivalent and clarify that the Bayes method is only advantageous for multivariable estimations.


Probabilistic Clustering of Time-Evolving Distance Data

arXiv.org Machine Learning

We present a novel probabilistic clustering model for objects that are represented via pairwise distances and observed at different time points. The proposed method utilizes the information given by adjacent time points to find the underlying cluster structure and obtain a smooth cluster evolution. This approach allows the number of objects and clusters to differ at every time point, and no identification on the identities of the objects is needed. Further, the model does not require the number of clusters being specified in advance -- they are instead determined automatically using a Dirichlet process prior. We validate our model on synthetic data showing that the proposed method is more accurate than state-of-the-art clustering methods. Finally, we use our dynamic clustering model to analyze and illustrate the evolution of brain cancer patients over time.


Topic Modeling of Hierarchical Corpora

arXiv.org Machine Learning

We study the problem of topic modeling in corpora whose documents are organized in a multi-level hierarchy. We explore a parametric approach to this problem, assuming that the number of topics is known or can be estimated by cross-validation. The models we consider can be viewed as special (finite-dimensional) instances of hierarchical Dirichlet processes (HDPs). For these models we show that there exists a simple variational approximation for probabilistic inference. The approximation relies on a previously unexploited inequality that handles the conditional dependence between Dirichlet latent variables in adjacent levels of the model's hierarchy. We compare our approach to existing implementations of nonparametric HDPs. On several benchmarks we find that our approach is faster than Gibbs sampling and able to learn more predictive models than existing variational methods. Finally, we demonstrate the large-scale viability of our approach on two newly available corpora from researchers in computer security---one with 350,000 documents and over 6,000 internal subcategories, the other with a five-level deep hierarchy.


Geometric Representations of Random Hypergraphs

arXiv.org Machine Learning

A parametrization of hypergraphs based on the geometry of points in $\mathbf{R}^d$ is developed. Informative prior distributions on hypergraphs are induced through this parametrization by priors on point configurations via spatial processes. This prior specification is used to infer conditional independence models or Markov structure of multivariate distributions. Specifically, we can recover both the junction tree factorization as well as the hyper Markov law. This approach offers greater control on the distribution of graph features than Erd\"os-R\'enyi random graphs, supports inference of factorizations that cannot be retrieved by a graph alone, and leads to new Metropolis\slash Hastings Markov chain Monte Carlo algorithms with both local and global moves in graph space. We illustrate the utility of this parametrization and prior specification using simulations.