Bayesian Inference
On the Sample Complexity of Learning Bayesian Networks
In recent years there has been an increasing interest in learning Bayesian networks from data. One of the most effective methods for learning such networks is based on the minimum description length (MDL) principle. Previous work has shown that this learning procedure is asymptotically successful: with probability one, it will converge to the target distribution, given a sufficient number of samples. However, the rate of this convergence has been hitherto unknown. In this work we examine the sample complexity of MDL based learning procedures for Bayesian networks. We show that the number of samples needed to learn an epsilon-close approximation (in terms of entropy distance) with confidence delta is O((1/epsilon)^(4/3)log(1/epsilon)log(1/delta)loglog (1/delta)). This means that the sample complexity is a low-order polynomial in the error threshold and sub-linear in the confidence bound. We also discuss how the constants in this term depend on the complexity of the target distribution. Finally, we address questions of asymptotic minimality and propose a method for using the sample complexity results to speed up the learning process.
Learning Bayesian Networks with Local Structure
Friedman, Nir, Goldszmidt, Moises
In this paper we examine a novel addition to the known methods for learning Bayesian networks from data that improves the quality of the learned networks. Our approach explicitly represents and learns the local structure in the conditional probability tables (CPTs), that quantify these networks. This increases the space of possible models, enabling the representation of CPTs with a variable number of parameters that depends on the learned local structures. The resulting learning procedure is capable of inducing models that better emulate the real complexity of the interactions present in the data. We describe the theoretical foundations and practical aspects of learning local structures, as well as an empirical evaluation of the proposed method. This evaluation indicates that learning curves characterizing the procedure that exploits the local structure converge faster than these of the standard procedure. Our results also show that networks learned with local structure tend to be more complex (in terms of arcs), yet require less parameters.
Quasi-Bayesian Strategies for Efficient Plan Generation: Application to the Planning to Observe Problem
Cozman, Fabio Gagliardi, Krotkov, Eric
Quasi-Bayesian theory uses convex sets of probability distributions and expected loss to represent preferences about plans. The theory focuses on decision robustness, i.e., the extent to which plans are affected by deviations in subjective assessments of probability. The present work presents solutions for plan generation when robustness of probability assessments must be included: plans contain information about the robustness of certain actions. The surprising result is that some problems can be solved faster in the Quasi-Bayesian framework than within usual Bayesian theory. We investigate this on the planning to observe problem, i.e., an agent must decide whether to take new observations or not. The fundamental question is: How, and how much, to search for a "best" plan, based on the robustness of probability assessments? Plan generation algorithms are derived in the context of material classification with an acoustic robotic probe. A package that constructs Quasi-Bayesian plans is available through anonymous ftp.
Bayesian Learning of Loglinear Models for Neural Connectivity
Laskey, Kathryn Blackmond, Martignon, Laura
This paper presents a Bayesian approach to learning the connectivity structure of a group of neurons from data on configuration frequencies. A major objective of the research is to provide statistical tools for detecting changes in firing patterns with changing stimuli. Our framework is not restricted to the well-understood case of pair interactions, but generalizes the Boltzmann machine model to allow for higher order interactions. The paper applies a Markov Chain Monte Carlo Model Composition (MC3) algorithm to search over connectivity structures and uses Laplace's method to approximate posterior probabilities of structures. Performance of the methods was tested on synthetic data. The models were also applied to data obtained by Vaadia on multi-unit recordings of several neurons in the visual cortex of a rhesus monkey in two different attentional states. Results confirmed the experimenters' conjecture that different attentional states were associated with different interaction structures.
Flexible Policy Construction by Information Refinement
Horsch, Michael C., Poole, David L.
We report on work towards flexible algorithms for solving decision problems represented as influence diagrams. An algorithm is given to construct a tree structure for each decision node in an influence diagram. Each tree represents a decision function and is constructed incrementally. The improvements to the tree converge to the optimal decision function (neglecting computational costs) and the asymptotic behaviour is only a constant factor worse than dynamic programming techniques, counting the number of Bayesian network queries. Empirical results show how expected utility increases with the size of the tree and the number of Bayesian net calculations.
Why Is Diagnosis Using Belief Networks Insensitive to Imprecision In Probabilities?
Henrion, Max, Pradhan, Malcolm, del Favero, Brendan, Huang, Kurt, Provan, Gregory M., O'Rorke, Paul
Recent research has found that diagnostic performance with Bayesian belief networks is often surprisingly insensitive to imprecision in the numerical probabilities. For example, the authors have recently completed an extensive study in which they applied random noise to the numerical probabilities in a set of belief networks for medical diagnosis, subsets of the CPCS network, a subset of the QMR (Quick Medical Reference) focused on liver and bile diseases. The diagnostic performance in terms of the average probabilities assigned to the actual diseases showed small sensitivity even to large amounts of noise. In this paper, we summarize the findings of this study and discuss possible explanations of this low sensitivity. One reason is that the criterion for performance is average probability of the true hypotheses, rather than average error in probability, which is insensitive to symmetric noise distributions. But, we show that even asymmetric, logodds-normal noise has modest effects. A second reason is that the gold-standard posterior probabilities are often near zero or one, and are little disturbed by noise.
Some Experiments with Real-Time Decision Algorithms
D'Ambrosio, Bruce, Burgess, Scott
Real-time Decision algorithms are a class of incremental resource-bounded [Horvitz, 89] or anytime [Dean, 93] algorithms for evaluating influence diagrams. We present a test domain for real-time decision algorithms, and the results of experiments with several Real-time Decision Algorithms in this domain. The results demonstrate high performance for two algorithms, a decision-evaluation variant of Incremental Probabilisitic Inference [D'Ambrosio, 93] and a variant of an algorithm suggested by Goldszmidt, [Goldszmidt, 95], PKreduced. We discuss the implications of these experimental results and explore the broader applicability of these algorithms.
Independence with Lower and Upper Probabilities
It is shown that the ability of the interval probability representation to capture epistemological independence is severely limited. Two events are epistemologically independent if knowledge of the first event does not alter belief (i.e., probability bounds) about the second. However, independence in this form can only exist in a 2-monotone probability function in degenerate cases i.e., if the prior bounds are either point probabilities or entirely vacuous. Additional limitations are characterized for other classes of lower probabilities as well. It is argued that these phenomena are simply a matter of interpretation. They appear to be limitations when one interprets probability bounds as a measure of epistemological indeterminacy (i.e., uncertainty arising from a lack of knowledge), but are exactly as one would expect when probability intervals are interpreted as representations of ontological indeterminacy (indeterminacy introduced by structural approximations). The ontological interpretation is introduced and discussed.
Tail Sensitivity Analysis in Bayesian Networks
Castillo, Enrique F., Solares, Cristina, Gomez, Patricia
The paper presents an efficient method for simulating the tails of a target variable Z=h(X) which depends on a set of basic variables X=(X_1, ..., X_n). To this aim, variables X_i, i=1, ..., n are sequentially simulated in such a manner that Z=h(x_1, ..., x_i-1, X_i, ..., X_n) is guaranteed to be in the tail of Z. When this method is difficult to apply, an alternative method is proposed, which leads to a low rejection proportion of sample values, when compared with the Monte Carlo method. Both methods are shown to be very useful to perform a sensitivity analysis of Bayesian networks, when very large confidence intervals for the marginal/conditional probabilities are required, as in reliability or risk analysis. The methods are shown to behave best when all scores coincide. The required modifications for this to occur are discussed. The methods are illustrated with several examples and one example of application to a real case is used to illustrate the whole process.