Bayesian Inference
Efficient Inference in Large Discrete Domains
In this paper we examine the problem of inference in Bayesian Networks with discrete random variables that have very large or even unbounded domains. For example, in a domain where we are trying to identify a person, we may have variables that have as domains, the set of all names, the set of all postal codes, or the set of all credit card numbers. We cannot just have big tables of the conditional probabilities, but need compact representations. We provide an inference algorithm, based on variable elimination, for belief networks containing both large domain and normal discrete random variables. We use intensional (i.e., in terms of procedures) and extensional (in terms of listing the elements) representations of conditional probabilities and of the intermediate factors.
The Revisiting Problem in Mobile Robot Map Building: A Hierarchical Bayesian Approach
Stewart, Benjamin, Ko, Jonathan, Fox, Dieter, Konolige, Kurt
We present an application of hierarchical Bayesian estimation to robot map building. The revisiting problem occurs when a robot has to decide whether it is seeing a previously-built portion of a map, or is exploring new territory. This is a difficult decision problem, requiring the probability of being outside of the current known map. To estimate this probability, we model the structure of a "typical" environment as a hidden Markov model that generates sequences of views observed by a robot navigating through the environment. A Dirichlet prior over structural models is learned from previously explored environments. Whenever a robot explores a new environment, the posterior over the model is estimated by Dirichlet hyperparameters. Our approach is implemented and tested in the context of multi-robot map merging, a particularly difficult instance of the revisiting problem. Experiments with robot data show that the technique yields strong improvements over alternative methods.
An Importance Sampling Algorithm Based on Evidence Pre-propagation
Yuan, Changhe, Druzdzel, Marek J.
Precision achieved by stochastic sampling algorithms for Bayesian networks typically deteriorates in face of extremely unlikely evidence. To address this problem, we propose the Evidence Pre-propagation Importance Sampling algorithm (EPIS-BN), an importance sampling algorithm that computes an approximate importance function by the heuristic methods: loopy belief Propagation and e-cutoff. We tested the performance of e-cutoff on three large real Bayesian networks: ANDES, CPCS, and PATHFINDER. We observed that on each of these networks the EPIS-BN algorithm gives us a considerable improvement over the current state of the art algorithm, the AIS-BN algorithm. In addition, it avoids the costly learning stage of the AIS-BN algorithm.
Practically Perfect
Meek, Christopher, Chickering, David Maxwell
The property of perfectness plays an important role in the theory of Bayesian networks. First, the existence of perfect distributions for arbitrary sets of variables and directed acyclic graphs implies that various methods for reading independence from the structure of the graph (e.g., Pearl, 1988; Lauritzen, Dawid, Larsen & Leimer, 1990) are complete. Second, the asymptotic reliability of various search methods is guaranteed under the assumption that the generating distribution is perfect (e.g., Spirtes, Glymour & Scheines, 2000; Chickering & Meek, 2002). We provide a lower-bound on the probability of sampling a non-perfect distribution when using a fixed number of bits to represent the parameters of the Bayesian network. This bound approaches zero exponentially fast as one increases the number of bits used to represent the parameters. This result implies that perfect distributions with fixed-length representations exist. We also provide a lower-bound on the number of bits needed to guarantee that a distribution sampled from a uniform Dirichlet distribution is perfect with probability greater than 1/2. This result is useful for constructing randomized reductions for hardness proofs.
Solving MAP Exactly using Systematic Search
Park, James D., Darwiche, Adnan
MAP is the problem of finding a most probable instantiation of a set of variables in a Bayesian network given some evidence. Unlike computing posterior probabilities, or MPE (a special case of MAP), the time and space complexity of structural solutions for MAP are not only exponential in the network treewidth, but in a larger parameter known as the "constrained" treewidth. In practice, this means that computing MAP can be orders of magnitude more expensive than computing posterior probabilities or MPE. This paper introduces a new, simple upper bound on the probability of a MAP solution, which admits a tradeoff between the bound quality and the time needed to compute it. The bound is shown to be generally much tighter than those of other methods of comparable complexity. We use this proposed upper bound to develop a branch-and-bound search algorithm for solving MAP exactly. Experimental results demonstrate that the search algorithm is able to solve many problems that are far beyond the reach of any structure-based method for MAP. For example, we show that the proposed algorithm can compute MAP exactly and efficiently for some networks whose constrained treewidth is more than 40.
Extending Factor Graphs so as to Unify Directed and Undirected Graphical Models
The two most popular types of graphical model are directed models (Bayesian networks) and undirected models (Markov random fields, or MRFs). Directed and undirected models offer complementary properties in model construction, expressing conditional independencies, expressing arbitrary factorizations of joint distributions, and formulating message-passing inference algorithms. We show that the strengths of these two representations can be combined in a single type of graphical model called a 'factor graph'. Every Bayesian network or MRF can be easily converted to a factor graph that expresses the same conditional independencies, expresses the same factorization of the joint distribution, and can be used for probabilistic inference through application of a single, simple message-passing algorithm. In contrast to chain graphs, where message-passing is implemented on a hypergraph, message-passing can be directly implemented on the factor graph. We describe a modified 'Bayes-ball' algorithm for establishing conditional independence in factor graphs, and we show that factor graphs form a strict superset of Bayesian networks and MRFs. In particular, we give an example of a commonly-used 'mixture of experts' model fragment, whose independencies cannot be represented in a Bayesian network or an MRF, but can be represented in a factor graph. We finish by giving examples of real-world problems that are not well suited to representation in Bayesian networks and MRFs, but are well-suited to representation in factor graphs.
Phase Transition of Tractability in Constraint Satisfaction and Bayesian Network Inference
Identifying tractable subclasses and designing efficient algorithms for these tractable classes are important topics in the study of constraint satisfaction and Bayesian network inference problems. In this paper we investigate the asymptotic average behavior of a typical tractable subclass characterized by the treewidth of the problems. We show that the property of having a bounded treewidth in the constraint satisfaction problem and Bayesian network inference problem has a phase transition that occurs while the underlying structures of problems are still sparse. This implies that algorithms making use of treewidth based structural knowledge only work efficiently in a limited range of random instances. INTRODUCTION It is well known that many NP complete problems have tractable subclasses characterized by certain structural parameters.
Decision Making with Partially Consonant Belief Functions
Giang, Phan H., Shenoy, Prakash P.
This paper studies decision making for Walley's partially consonant belief functions (pcb). In a pcb, the set of foci are partitioned. Within each partition, the foci are nested. The pcb class includes probability functions and possibility functions as extreme cases. Unlike earlier proposals for a decision theory with belief functions, we employ an axiomatic approach. We adopt an axiom system similar in spirit to von Neumann - Morgenstern's linear utility theory for a preference relation on pcb lotteries. We prove a representation theorem for this relation. Utility for a pcb lottery is a combination of linear utility for probabilistic lottery and binary utility for possibilistic lottery.
A Linear Belief Function Approach to Portfolio Evaluation
Liu, Liping, Shenoy, Catherine, Shenoy, Prakash P.
By elaborating on the notion of linear belief functions (Dempster 1990; Liu 1996), we propose an elementary approach to knowledge representation for expert systems using linear belief functions. We show how to use basic matrices to represent market information and financial knowledge, including complete ignorance, statistical observations, subjective speculations, distributional assumptions, linear relations, and empirical asset pricing models. We then appeal to Dempster's rule of combination to integrate the knowledge for assessing an overall belief of portfolio performance, and updating the belief by incorporating additional information. We use an example of three gold stocks to illustrate the approach.
Large-Sample Learning of Bayesian Networks is NP-Hard
Chickering, David Maxwell, Meek, Christopher, Heckerman, David
In this paper, we provide new complexity results for algorithms that learn discrete-variable Bayesian networks from data. Our results apply whenever the learning algorithm uses a scoring criterion that favors the simplest model able to represent the generative distribution exactly. Our results therefore hold whenever the learning algorithm uses a consistent scoring criterion and is applied to a sufficiently large dataset. We show that identifying high-scoring structures is hard, even when we are given an independence oracle, an inference oracle, and/or an information oracle. Our negative results also apply to the learning of discrete-variable Bayesian networks in which each node has at most k parents, for all k > 3.