Goto

Collaborating Authors

 Bayesian Learning


An Efficient Algorithm for Computing Interventional Distributions in Latent Variable Causal Models

arXiv.org Machine Learning

Probabilistic inference in graphical models is the task of computing marginal and conditional densities of interest from a factorized representation of a joint probability distribution. Inference algorithms such as variable elimination and belief propagation take advantage of constraints embedded in this factorization to compute such densities efficiently. In this paper, we propose an algorithm which computes interventional distributions in latent variable causal models represented by acyclic directed mixed graphs (ADMGs). To compute these distributions efficiently, we take advantage of a recursive factorization which generalizes the usual Markov factorization for DAGs and the more recent factorization for ADMGs. Our algorithm can be viewed as a generalization of variable elimination to the mixed graph case. We show our algorithm is exponential in the mixed graph generalization of treewidth.


Fast MCMC sampling for Markov jump processes and continuous time Bayesian networks

arXiv.org Machine Learning

Markov jump processes and continuous time Bayesian networks are important classes of continuous time dynamical systems. In this paper, we tackle the problem of inferring unobserved paths in these models by introducing a fast auxiliary variable Gibbs sampler. Our approach is based on the idea of uniformization, and sets up a Markov chain over paths by sampling a finite set of virtual jump times and then running a standard hidden Markov model forward filtering-backward sampling algorithm over states at the set of extant and virtual jump times. We demonstrate significant computational benefits over a state-of-the-art Gibbs sampler on a number of continuous time Bayesian networks.


Partial Order MCMC for Structure Discovery in Bayesian Networks

arXiv.org Machine Learning

We present a new Markov chain Monte Carlo method for estimating posterior probabilities of structural features in Bayesian networks. The method draws samples from the posterior distribution of partial orders on the nodes; for each sampled partial order, the conditional probabilities of interest are computed exactly. We give both analytical and empirical results that suggest the superiority of the new method compared to previous methods, which sample either directed acyclic graphs or linear orders on the nodes.


Asymptotic Efficiency of Deterministic Estimators for Discrete Energy-Based Models: Ratio Matching and Pseudolikelihood

arXiv.org Machine Learning

Standard maximum likelihood estimation cannot be applied to discrete energy-based models in the general case because the computation of exact model probabilities is intractable. Recent research has seen the proposal of several new estimators designed specifically to overcome this intractability, but virtually nothing is known about their theoretical properties. In this paper, we present a generalized estimator that unifies many of the classical and recently proposed estimators. We use results from the standard asymptotic theory for M-estimators to derive a generic expression for the asymptotic covariance matrix of our generalized estimator. We apply these results to study the relative statistical efficiency of classical pseudolikelihood and the recently-proposed ratio matching estimator.


Noisy-OR Models with Latent Confounding

arXiv.org Machine Learning

Given a set of experiments in which varying subsets of observed variables are subject to intervention, we consider the problem of identifiability of causal models exhibiting latent confounding. While identifiability is trivial when each experiment intervenes on a large number of variables, the situation is more complicated when only one or a few variables are subject to intervention per experiment. For linear causal models with latent variables Hyttinen et al. (2010) gave precise conditions for when such data are sufficient to identify the full model. While their result cannot be extended to discrete-valued variables with arbitrary cause-effect relationships, we show that a similar result can be obtained for the class of causal models whose conditional probability distributions are restricted to a `noisy-OR' parameterization. We further show that identification is preserved under an extension of the model that allows for negative influences, and present learning algorithms that we test for accuracy, scalability and robustness.


Lipschitz Parametrization of Probabilistic Graphical Models

arXiv.org Machine Learning

We show that the log-likelihood of several probabilistic graphical models is Lipschitz continuous with respect to the lp-norm of the parameters. We discuss several implications of Lipschitz parametrization. We present an upper bound of the Kullback-Leibler divergence that allows understanding methods that penalize the lp-norm of differences of parameters as the minimization of that upper bound. The expected log-likelihood is lower bounded by the negative lp-norm, which allows understanding the generalization ability of probabilistic models. The exponential of the negative lp-norm is involved in the lower bound of the Bayes error rate, which shows that it is reasonable to use parameters as features in algorithms that rely on metric spaces (e.g. classification, dimensionality reduction, clustering). Our results do not rely on specific algorithms for learning the structure or parameters. We show preliminary results for activity recognition and temporal segmentation.


PAC-Bayesian Policy Evaluation for Reinforcement Learning

arXiv.org Machine Learning

Bayesian priors offer a compact yet general means of incorporating domain knowledge into many learning tasks. The correctness of the Bayesian analysis and inference, however, largely depends on accuracy and correctness of these priors. PAC-Bayesian methods overcome this problem by providing bounds that hold regardless of the correctness of the prior distribution. This paper introduces the first PAC-Bayesian bound for the batch reinforcement learning problem with function approximation. We show how this bound can be used to perform model-selection in a transfer learning scenario. Our empirical results confirm that PAC-Bayesian policy evaluation is able to leverage prior distributions when they are informative and, unlike standard Bayesian RL approaches, ignore them when they are misleading.


Near-Optimal Target Learning With Stochastic Binary Signals

arXiv.org Machine Learning

We study learning in a noisy bisection model: specifically, Bayesian algorithms to learn a target value V given access only to noisy realizations of whether V is less than or greater than a threshold theta. At step t = 0, 1, 2, ..., the learner sets threshold theta t and observes a noisy realization of sign(V - theta t). After T steps, the goal is to output an estimate V^ which is within an eta-tolerance of V . This problem has been studied, predominantly in environments with a fixed error probability q < 1/2 for the noisy realization of sign(V - theta t). In practice, it is often the case that q can approach 1/2, especially as theta -> V, and there is little known when this happens. We give a pseudo-Bayesian algorithm which provably converges to V. When the true prior matches our algorithm's Gaussian prior, we show near-optimal expected performance. Our methods extend to the general multiple-threshold setting where the observation noisily indicates which of k >= 2 regions V belongs to.


Belief Propagation by Message Passing in Junction Trees: Computing Each Message Faster Using GPU Parallelization

arXiv.org Artificial Intelligence

Compiling Bayesian networks (BNs) to junction trees and performing belief propagation over them is among the most prominent approaches to computing posteriors in BNs. However, belief propagation over junction tree is known to be computationally intensive in the general case. Its complexity may increase dramatically with the connectivity and state space cardinality of Bayesian network nodes. In this paper, we address this computational challenge using GPU parallelization. We develop data structures and algorithms that extend existing junction tree techniques, and specifically develop a novel approach to computing each belief propagation message in parallel. We implement our approach on an NVIDIA GPU and test it using BNs from several applications. Experimentally, we study how junction tree parameters affect parallelization opportunities and hence the performance of our algorithm. We achieve speedups ranging from 0.68 to 9.18 for the BNs studied.


Measuring the Hardness of Stochastic Sampling on Bayesian Networks with Deterministic Causalities: the k-Test

arXiv.org Artificial Intelligence

Approximate Bayesian inference is NP-hard. Dagum and Luby defined the Local Variance Bound (LVB) to measure the approximation hardness of Bayesian inference on Bayesian networks, assuming the networks model strictly positive joint probability distributions, i.e. zero probabilities are not permitted. This paper introduces the k-test to measure the approximation hardness of inference on Bayesian networks with deterministic causalities in the probability distribution, i.e. when zero conditional probabilities are permitted. Approximation by stochastic sampling is a widely-used inference method that is known to suffer from inefficiencies due to sample rejection. The k-test predicts when rejection rates of stochastic sampling a Bayesian network will be low, modest, high, or when sampling is intractable.