Goto

Collaborating Authors

 Bayesian Inference


A Temporal Bayesian Network for Diagnosis and Prediction

arXiv.org Artificial Intelligence

Diagnosis and prediction in some domains, like medical and industrial diagnosis, require a representation that combines uncertainty management and temporal reasoning. Based on the fact that in many cases there are few state changes in the temporal range of interest, we propose a novel representation called Temporal Nodes Bayesian Networks (TNBN). In a TNBN each node represents an event or state change of a variable, and an arc corresponds to a causal-temporal relationship. The temporal intervals can differ in number and size for each temporal node, so this allows multiple granularity. Our approach is contrasted with a dynamic Bayesian network for a simple medical example. An empirical evaluation is presented for a more complex problem, a subsystem of a fossil power plant, in which this approach is used for fault diagnosis and prediction with good results.


Non-parametric Bayesian modelling of digital gene expression data

arXiv.org Machine Learning

Next-generation sequencing technologies provide a revolutionary tool for generating gene expression data. Starting with a fixed RNA sample, they construct a library of millions of differentially abundant short sequence tags or "reads", which constitute a fundamentally discrete measure of the level of gene expression. A common limitation in experiments using these technologies is the low number or even absence of biological replicates, which complicates the statistical analysis of digital gene expression data. Analysis of this type of data has often been based on modified tests originally devised for analysing microarrays; both these and even de novo methods for the analysis of RNA-seq data are plagued by the common problem of low replication. We propose a novel, non-parametric Bayesian approach for the analysis of digital gene expression data. We begin with a hierarchical model for modelling over-dispersed count data and a blocked Gibbs sampling algorithm for inferring the posterior distribution of model parameters conditional on these counts. The algorithm compensates for the problem of low numbers of biological replicates by clustering together genes with tag counts that are likely sampled from a common distribution and using this augmented sample for estimating the parameters of this distribution. The number of clusters is not decided a priori, but it is inferred along with the remaining model parameters. We demonstrate the ability of this approach to model biological data with high fidelity by applying the algorithm on a public dataset obtained from cancerous and non-cancerous neural tissues.


Follow the Leader If You Can, Hedge If You Must

arXiv.org Machine Learning

Follow-the-Leader (FTL) is an intuitive sequential prediction strategy that guarantees constant regret in the stochastic setting, but has terrible performance for worst-case data. Other hedging strategies have better worst-case guarantees but may perform much worse than FTL if the data are not maximally adversarial. We introduce the FlipFlop algorithm, which is the first method that provably combines the best of both worlds. As part of our construction, we develop AdaHedge, which is a new way of dynamically tuning the learning rate in Hedge without using the doubling trick. AdaHedge refines a method by Cesa-Bianchi, Mansour and Stoltz (2007), yielding slightly improved worst-case guarantees. By interleaving AdaHedge and FTL, the FlipFlop algorithm achieves regret within a constant factor of the FTL regret, without sacrificing AdaHedge's worst-case guarantees. AdaHedge and FlipFlop do not need to know the range of the losses in advance; moreover, unlike earlier methods, both have the intuitive property that the issued weights are invariant under rescaling and translation of the losses. The losses are also allowed to be negative, in which case they may be interpreted as gains.


Game Networks

arXiv.org Artificial Intelligence

We introduce Game networks (G nets), a novel representation for multi-agent decision problems. Compared to other game-theoretic representations, such as strategic or extensive forms, G nets are more structured and more compact; more fundamentally, G nets constitute a computationally advantageous framework for strategic inference, as both probability and utility independencies are captured in the structure of the network and can be exploited in order to simplify the inference process. An important aspect of multi-agent reasoning is the identification of some or all of the strategic equilibria in a game; we present original convergence methods for strategic equilibrium which can take advantage of strategic separabilities in the G net structure in order to simplify the computations. Specifically, we describe a method which identifies a unique equilibrium as a function of the game payoffs, and one which identifies all equilibria.


Monte Carlo Inference via Greedy Importance Sampling

arXiv.org Machine Learning

We present a new method for conducting Monte Carlo inference in graphical models which combines explicit search with generalized importance sampling. The idea is to reduce the variance of importance sampling by searching for significant points in the target distribution. We prove that it is possible to introduce search and still maintain unbiasedness. We then demonstrate our procedure on a few simple inference tasks and show that it can improve the inference quality of standard MCMC methods, including Gibbs sampling, Metropolis sampling, and Hybrid Monte Carlo. This paper extends previous work which showed how greedy importance sampling could be correctly realized in the one-dimensional case.


Being Bayesian about Network Structure

arXiv.org Machine Learning

In many domains, we are interested in analyzing the structure of the underlying distribution, e.g., whether one variable is a direct parent of the other. Bayesian model-selection attempts to find the MAP model and use its structure to answer these questions. However, when the amount of available data is modest, there might be many models that have non-negligible posterior. Thus, we want compute the Bayesian posterior of a feature, i.e., the total posterior probability of all models that contain it. In this paper, we propose a new approach for this task. We first show how to efficiently compute a sum over the exponential number of networks that are consistent with a fixed ordering over network variables. This allows us to compute, for a given ordering, both the marginal probability of the data and the posterior of a feature. We then use this result as the basis for an algorithm that approximates the Bayesian posterior of a feature. Our approach uses a Markov Chain Monte Carlo (MCMC) method, but over orderings rather than over network structures. The space of orderings is much smaller and more regular than the space of structures, and has a smoother posterior `landscape'. We present empirical results on synthetic and real-life datasets that compare our approach to full model averaging (when possible), to MCMC over network structures, and to a non-Bayesian bootstrap approach.


Gaussian Process Networks

arXiv.org Machine Learning

In this paper we address the problem of learning the structure of a Bayesian network in domains with continuous variables. This task requires a procedure for comparing different candidate structures. In the Bayesian framework, this is done by evaluating the {em marginal likelihood/} of the data given a candidate structure. This term can be computed in closed-form for standard parametric families (e.g., Gaussians), and can be approximated, at some computational cost, for some semi-parametric families (e.g., mixtures of Gaussians). We present a new family of continuous variable probabilistic networks that are based on {em Gaussian Process/} priors. These priors are semi-parametric in nature and can learn almost arbitrary noisy functional relations. Using these priors, we can directly compute marginal likelihoods for structure learning. The resulting method can discover a wide range of functional dependencies in multivariate data. We develop the Bayesian score of Gaussian Process Networks and describe how to learn them from data. We present empirical results on artificial data as well as on real-life domains with non-linear dependencies.


Variational Approximations between Mean Field Theory and the Junction Tree Algorithm

arXiv.org Machine Learning

Recently, variational approximations such as the mean field approximation have received much interest. We extend the standard mean field method by using an approximating distribution that factorises into cluster potentials. This includes undirected graphs, directed acyclic graphs and junction trees. We derive generalized mean field equations to optimize the cluster potentials. We show that the method bridges the gap between the standard mean field approximation and the exact junction tree algorithm. In addition, we address the problem of how to choose the graphical structure of the approximating distribution. From the generalised mean field equations we derive rules to simplify the structure of the approximating distribution in advance without affecting the quality of the approximation. We also show how the method fits into some other variational approximations that are currently popular.


Adaptive Importance Sampling for Estimation in Structured Domains

arXiv.org Machine Learning

Sampling is an important tool for estimating large, complex sums and integrals over high dimensional spaces. For instance, important sampling has been used as an alternative to exact methods for inference in belief networks. Ideally, we want to have a sampling distribution that provides optimal-variance estimators. In this paper, we present methods that improve the sampling distribution by systematically adapting it as we obtain information from the samples. We present a stochastic-gradient-descent method for sequentially updating the sampling distribution based on the direct minization of the variance. We also present other stochastic-gradient-descent methods based on the minimizationof typical notions of distance between the current sampling distribution and approximations of the target, optimal distribution. We finally validate and compare the different methods empirically by applying them to the problem of action evaluation in influence diagrams.


Tractable Bayesian Learning of Tree Belief Networks

arXiv.org Machine Learning

In this paper we present decomposable priors, a family of priors over structure and parameters of tree belief nets for which Bayesian learning with complete observations is tractable, in the sense that the posterior is also decomposable and can be completely determined analytically in polynomial time. This follows from two main results: First, we show that factored distributions over spanning trees in a graph can be integrated in closed form. Second, we examine priors over tree parameters and show that a set of assumptions similar to (Heckerman and al. 1995) constrain the tree parameter priors to be a compactly parameterized product of Dirichlet distributions. Beside allowing for exact Bayesian learning, these results permit us to formulate a new class of tractable latent variable models in which the likelihood of a data point is computed through an ensemble average over tree structures.