Goto

Collaborating Authors

 Bayesian Inference


Bayesian Error-Bars for Belief Net Inference

arXiv.org Artificial Intelligence

A Bayesian Belief Network (BN) is a model of a joint distribution over a setof n variables, with a DAG structure to represent the immediate dependenciesbetween the variables, and a set of parameters (aka CPTables) to represent thelocal conditional probabilities of a node, given each assignment to itsparents. In many situations, these parameters are themselves random variables - this may reflect the uncertainty of the domain expert, or may come from atraining sample used to estimate the parameter values. The distribution overthese "CPtable variables" induces a distribution over the response the BNwill return to any "What is Pr(H | E)?" query. This paper investigates thevariance of this response, showing first that it is asymptotically normal,then providing its mean and asymptotical variance. We then present aneffective general algorithm for computing this variance, which has the samecomplexity as simply computing the (mean value of) the response itself - ie,O(n 2^w), where n is the number of variables and w is the effective treewidth. Finally, we provide empirical evidence that this algorithm, whichincorporates assumptions and approximations, works effectively in practice,given only small samples.


Maximum Likelihood Bounded Tree-Width Markov Networks

arXiv.org Artificial Intelligence

Chow and Liu (1968) studied the problem of learning a maximumlikelihood Markov tree. We generalize their work to more complexMarkov networks by considering the problem of learning a maximumlikelihood Markov network of bounded complexity. We discuss howtree-width is in many ways the appropriate measure of complexity andthus analyze the problem of learning a maximum likelihood Markovnetwork of bounded tree-width.Similar to the work of Chow and Liu, we are able to formalize thelearning problem as a combinatorial optimization problem on graphs. Weshow that learning a maximum likelihood Markov network of boundedtree-width is equivalent to finding a maximum weight hypertree. Thisequivalence gives rise to global, integer-programming based,approximation algorithms with provable performance guarantees, for thelearning problem. This contrasts with heuristic local-searchalgorithms which were previously suggested (e.g. by Malvestuto 1991).The equivalence also allows us to study the computational hardness ofthe learning problem. We show that learning a maximum likelihoodMarkov network of bounded tree-width is NP-hard, and discuss thehardness of approximation.


Toward General Analysis of Recursive Probability Models

arXiv.org Artificial Intelligence

There is increasing interest within the research community in the design and use of recursive probability models. Although there still remains concern about computational complexity costs and the fact that computing exact solutions can be intractable for many nonrecursive models and impossible in the general case for recursive problems, several research groups are actively developing computational techniques for recursive stochastic languages. We have developed an extension to the traditional lambda-calculus as a framework for families of Turing complete stochastic languages. We have also developed a class of exact inference algorithms based on the traditional reductions of the lambda-calculus. We further propose that using the deBruijn notation (a lambda-calculus notation with nameless dummies) supports effective caching in such systems (caching being an essential component of efficient computation). Finally, our extension to the lambda-calculus offers a foundation and general theory for the construction of recursive stochastic modeling languages as well as promise for effective caching and efficient approximation algorithms for inference.


Sufficiency, Separability and Temporal Probabilistic Models

arXiv.org Artificial Intelligence

Suppose we are given the conditional probability of one variable given some other variables.Normally the full joint distribution over the conditioning variablesis required to determine the probability of the conditioned variable.Under what circumstances are the marginal distributions over the conditioning variables sufficient to determine the probability ofthe conditioned variable?Sufficiency in this sense is equivalent to additive separability ofthe conditional probability distribution.Such separability structure is natural and can be exploited forefficient inference.Separability has a natural generalization to conditional separability.Separability provides a precise notion of weaklyinteracting subsystems in temporal probabilistic models.Given a system that is decomposed into separable subsystems, exactmarginal probabilities over subsystems at future points in time can becomputed by propagating marginal subsystem probabilities, rather thancomplete system joint probabilities.Thus, separability can make exact prediction tractable.However, observations can break separability,so exact monitoring of dynamic systems remains hard.


Approximating MAP using Local Search

arXiv.org Artificial Intelligence

MAP is the problem of finding a most probable instantiation of a set of variables in a Bayesian network, given evidence. Unlike computing marginals, posteriors, and MPE (a special case of MAP), the time and space complexity of MAP is not only exponential in the network treewidth, but also in a larger parameter known as the "constrained" treewidth. In practice, this means that computing MAP can be orders of magnitude more expensive than computingposteriors or MPE. Thus, practitioners generally avoid MAP computations, resorting instead to approximating them by the most likely value for each MAP variableseparately, or by MPE.We present a method for approximating MAP using local search. This method has space complexity which is exponential onlyin the treewidth, as is the complexity of each search step. We investigate the effectiveness of different local searchmethods and several initialization strategies and compare them to otherapproximation schemes.Experimental results show that local search provides a much more accurate approximation of MAP, while requiring few search steps.Practically, this means that the complexity of local search is often exponential only in treewidth as opposed to the constrained treewidth, making approximating MAP as efficient as other computations.


Recognition Networks for Approximate Inference in BN20 Networks

arXiv.org Artificial Intelligence

We propose using recognition networks for approximate inference inBayesian networks (BNs). A recognition network is a multilayerperception (MLP) trained to predict posterior marginals given observedevidence in a particular BN. The input to the MLP is a vector of thestates of the evidential nodes. The activity of an output unit isinterpreted as a prediction of the posterior marginal of thecorresponding variable. The MLP is trained using samples generated fromthe corresponding BN.We evaluate a recognition network that was trained to do inference ina large Bayesian network, similar in structure and complexity to theQuick Medical Reference, Decision Theoretic (QMR-DT). Our networkis a binary, two-layer, noisy-OR network containing over 4000 potentially observable nodes and over 600 unobservable, hidden nodes. Inreal medical diagnosis, most observables are unavailable, and there isa complex and unknown bias that selects which ones are provided. Weincorporate a very basic type of selection bias in our network: a knownpreference that available observables are positive rather than negative.Even this simple bias has a significant effect on the posterior. We compare the performance of our recognition network tostate-of-the-art approximate inference algorithms on a large set oftest cases. In order to evaluate the effect of our simplistic modelof the selection bias, we evaluate algorithms using a variety ofincorrectly modeled observation biases. Recognition networks performwell using both correct and incorrect observation biases.


Expectation Propagation for approximate Bayesian inference

arXiv.org Artificial Intelligence

This paper presents a new deterministic approximation technique in Bayesian networks. This method, "Expectation Propagation", unifies two previous techniques: assumed-density filtering, an extension of the Kalman filter, and loopy belief propagation, an extension of belief propagation in Bayesian networks. All three algorithms try to recover an approximate distribution which is close in KL divergence to the true distribution. Loopy belief propagation, because it propagates exact belief states, is useful for a limited class of belief networks, such as those which are purely discrete. Expectation Propagation approximates the belief states by only retaining certain expectations, such as mean and variance, and iterates until these expectations are consistent throughout the network. This makes it applicable to hybrid networks with discrete and continuous nodes. Expectation Propagation also extends belief propagation in the opposite direction - it can propagate richer belief states that incorporate correlations between nodes. Experiments with Gaussian mixture models show Expectation Propagation to be convincingly better than methods with similar computational cost: Laplace's method, variational Bayes, and Monte Carlo. Expectation Propagation also provides an efficient algorithm for training Bayes point machine classifiers.


Aggregating Learned Probabilistic Beliefs

arXiv.org Artificial Intelligence

We consider the task of aggregating beliefs of severalexperts. We assume that these beliefs are represented as probabilitydistributions. We argue that the evaluation of any aggregationtechnique depends on the semantic context of this task. We propose aframework, in which we assume that nature generates samples from a`true' distribution and different experts form their beliefs based onthe subsets of the data they have a chance to observe. Naturally, theideal aggregate distribution would be the one learned from thecombined sample sets. Such a formulation leads to a natural way tomeasure the accuracy of the aggregation mechanism.We show that the well-known aggregation operator LinOP is ideallysuited for that task. We propose a LinOP-based learning algorithm,inspired by the techniques developed for Bayesian learning, whichaggregates the experts' distributions represented as Bayesiannetworks. Our preliminary experiments show that this algorithmperforms well in practice.


A Bayesian Multiresolution Independence Test for Continuous Variables

arXiv.org Artificial Intelligence

In this paper we present a method ofcomputing the posterior probability ofconditional independence of two or morecontinuous variables from data,examined at several resolutions. Ourapproach is motivated by theobservation that the appearance ofcontinuous data varies widely atvarious resolutions, producing verydifferent independence estimatesbetween the variablesinvolved. Therefore, it is difficultto ascertain independence withoutexamining data at several carefullyselected resolutions. In our paper, weaccomplish this using the exactcomputation of the posteriorprobability of independence, calculatedanalytically given a resolution. Ateach examined resolution, we assume amultinomial distribution with Dirichletpriors for the discretized tableparameters, and compute the posteriorusing Bayesian integration. Acrossresolutions, we use a search procedureto approximate the Bayesian integral ofprobability over an exponential numberof possible histograms. Our methodgeneralizes to an arbitrary numbervariables in a straightforward manner.The test is suitable for Bayesiannetwork learning algorithms that useindependence tests to infer the networkstructure, in domains that contain anymix of continuous, ordinal andcategorical variables.


Solving Influence Diagrams using HUGIN, Shafer-Shenoy and Lazy Propagation

arXiv.org Artificial Intelligence

In this paper we compare three different architectures for the evaluation of influence diagrams: HUGIN, Shafer-Shenoy, and Lazy Evaluation architecture. The computational complexity of the architectures are compared on the LImited Memory Influence Diagram (LIMID): a diagram where only the requiste information for the computation of the optimal policies are depicted. Because the requsite information is explicitly represented in the LIMID the evaluation can take advantage of it, and significant savings in computational can be obtained. In this paper we show how the obtained savings is considerably increased when the computations performed on the LIMID is according to the Lazy Evaluation scheme.