Bayesian Learning
Exact Inference in Networks with Discrete Children of Continuous Parents
Lerner, Uri, Segal, Eran, Koller, Daphne
Many real life domains contain a mixture of discrete and continuous variables and can be modeled as hybrid Bayesian Networks. Animportant subclass of hybrid BNs are conditional linear Gaussian (CLG) networks, where the conditional distribution of the continuous variables given an assignment to the discrete variables is a multivariate Gaussian. Lauritzen's extension to the clique tree algorithm can be used for exact inference in CLG networks. However, many domains also include discrete variables that depend on continuous ones, and CLG networks do not allow such dependencies to berepresented. No exact inference algorithm has been proposed for these enhanced CLG networks. In this paper, we generalize Lauritzen's algorithm, providing the first "exact" inference algorithm for augmented CLG networks - networks where continuous nodes are conditional linear Gaussians but that also allow discrete children ofcontinuous parents. Our algorithm is exact in the sense that it computes the exact distributions over the discrete nodes, and the exact first and second moments of the continuous ones, up to the accuracy obtained by numerical integration used within thealgorithm. When the discrete children are modeled with softmax CPDs (as is the case in many real world domains) the approximation of the continuous distributions using the first two moments is particularly accurate. Our algorithm is simple to implement and often comparable in its complexity to Lauritzen's algorithm. We show empirically that it achieves substantially higher accuracy than previous approximate algorithms.
Inference in Hybrid Networks: Theoretical Limits and Practical Algorithms
An important subclass of hybrid Bayesian networks are those that represent Conditional Linear Gaussian (CLG) distributions --- a distribution with a multivariate Gaussian component for each instantiation of the discrete variables. In this paper we explore the problem of inference in CLGs. We show that inference in CLGs can be significantly harder than inference in Bayes Nets. In particular, we prove that even if the CLG is restricted to an extremely simple structure of a polytree in which every continuous node has at most one discrete ancestor, the inference task is NP-hard.To deal with the often prohibitive computational cost of the exact inference algorithm for CLGs, we explore several approximate inference algorithms. These algorithms try to find a small subset of Gaussians which are a good approximation to the full mixture distribution. We consider two Monte Carlo approaches and a novel approach that enumerates mixture components in order of prior probability. We compare these methods on a variety of problems and show that our novel algorithm is very promising for large, hybrid diagnosis problems.
Hypothesis Management in Situation-Specific Network Construction
Laskey, Kathryn Blackmond, Mahoney, Suzanne M., Wright, Ed
This paper considers the problem of knowledge-based model construction in the presence of uncertainty about the association of domain entities to random variables. Multi-entity Bayesian networks (MEBNs) are defined as a representation for knowledge in domains characterized by uncertainty in the number of relevant entities, their interrelationships, and their association with observables. An MEBN implicitly specifies a probability distribution in terms of a hierarchically structured collection of Bayesian network fragments that together encode a joint probability distribution over arbitrarily many interrelated hypotheses. Although a finite query-complete model can always be constructed, association uncertainty typically makes exact model construction and evaluation intractable. The objective of hypothesis management is to balance tractability against accuracy. We describe an application to the problem of using intelligence reports to infer the organization and activities of groups of military vehicles. Our approach is compared to related work in the tracking and fusion literature.
Improved learning of Bayesian networks
The search space of Bayesian Network structures is usually defined as Acyclic Directed Graphs (DAGs) and the search is done by local transformations of DAGs. But the space of Bayesian Networks is ordered by DAG Markov model inclusion and it is natural to consider that a good search policy should take this into account. First attempt to do this (Chickering 1996) was using equivalence classes of DAGs instead of DAGs itself. This approach produces better results but it is significantly slower. We present a compromise between these two approaches. It uses DAGs to search the space in such a way that the ordering by inclusion is taken into account. This is achieved by repetitive usage of local moves within the equivalence class of DAGs. We show that this new approach produces better results than the original DAGs approach without substantial change in time complexity. We present empirical results, within the framework of heuristic search and Markov Chain Monte Carlo, provided through the Alarm dataset.
On characterizing Inclusion of Bayesian Networks
Kocka, Tomas, Bouckaert, Remco R., Studeny, Milan
Every directed acyclic graph (DAG) over a finite non-empty set of variables (= nodes) N induces an independence model over N, which is a list of conditional independence statements over N.The inclusion problem is how to characterize (in graphical terms) whether all independence statements in the model induced by a DAG K are in the model induced by a second DAG L. Meek (1997) conjectured that this inclusion holds iff there exists a sequence of DAGs from L to K such that only certain 'legal' arrow reversal and 'legal' arrow adding operations are performed to get the next DAG in the sequence.In this paper we give several characterizations of inclusion of DAG models and verify Meek's conjecture in the case that the DAGs K and L differ in at most one adjacency. As a warming up a rigorous proof of well-known graphical characterizations of equivalence of DAGs, which is a highly related problem, is given.
Estimating Well-Performing Bayesian Networks using Bernoulli Mixtures
A novel method for estimating Bayesian network (BN) parameters from data is presented which provides improved performance on test data. Previous research has shown the value of representing conditional probability distributions (CPDs) via neural networks(Neal 1992), noisy-OR gates (Neal 1992, Diez 1993)and decision trees (Friedman and Goldszmidt 1996).The Bernoulli mixture network (BMN) explicitly represents the CPDs of discrete BN nodes as mixtures of local distributions,each having a different set of parents.This increases the space of possible structures which can be considered,enabling the CPDs to have finer-grained dependencies.The resulting estimation procedure induces a modelthat is better able to emulate the underlying interactions occurring in the data than conventional conditional Bernoulli network models.The results for artificially generated data indicate that overfitting is best reduced by restricting the complexity of candidate mixture substructures local to each node. Furthermore, mixtures of very simple substructures can perform almost as well as more complex ones.The BMN is also applied to data collected from an online adventure game with an application to keyhole plan recognition. The results show that the BMN-based model brings a dramatic improvement in performance over a conventional BN model.
A Bayesian Approach to Tackling Hard Computational Problems
Horvitz, Eric J., Ruan, Yongshao, Gomes, Carla P., Kautz, Henry, Selman, Bart, Chickering, David Maxwell
We are developing a general framework for using learned Bayesian models for decision-theoretic control of search and reasoningalgorithms. We illustrate the approach on the specific task of controlling both general and domain-specific solvers on a hard class of structured constraint satisfaction problems. A successful strategyfor reducing the high (and even infinite) variance in running time typically exhibited by backtracking search algorithms is to cut off and restart the search if a solution is not found within a certainamount of time. Previous work on restart strategies have employed fixed cut off values. We show how to create a dynamic cut off strategy by learning a Bayesian model that predicts the ultimate length of a trial based on observing the early behavior of the search algorithm. Furthermore, we describe the general conditions under which a dynamic restart strategy can outperform the theoretically optimal fixed strategy.
Enumerating Markov Equivalence Classes of Acyclic Digraph Models
Gillispie, Steven B., Perlman, Michael D.
Graphical Markov models determined by acyclic digraphs (ADGs), also called directed acyclic graphs (DAGs), are widely studied in statistics, computer science (as Bayesian networks), operations research (as influence diagrams), and many related fields. Because different ADGs may determine the same Markov equivalence class, it long has been of interest to determine the efficiency gained in model specification and search by working directly with Markov equivalence classes of ADGs rather than with ADGs themselves. A computer program was written to enumerate the equivalence classes of ADG models as specified by Pearl & Verma's equivalence criterion. The program counted equivalence classes for models up to and including 10 vertices. The ratio of number of classes to ADGs appears to approach an asymptote of about 0.267. Classes were analyzed according to number of edges and class size. By edges, the distribution of number of classes approaches a Gaussian shape. By class size, classes of size 1 are most common, with the proportions for larger sizes initially decreasing but then following a more irregular pattern. The maximum number of classes generated by any undirected graph was found to increase approximately factorially. The program also includes a new variation of orderly algorithm for generating undirected graphs.
Multivariate Information Bottleneck
Friedman, Nir, Mosenzon, Ori, Slonim, Noam, Tishby, Naftali
The Information bottleneck method is an unsupervised non-parametric data organization technique. Given a joint distribution P(A,B), this method constructs a new variable T that extracts partitions, or clusters, over the values of A that are informative about B. The information bottleneck has already been applied to document classification, gene expression, neural code, and spectral analysis. In this paper, we introduce a general principled framework for multivariate extensions of the information bottleneck method. This allows us to consider multiple systems of data partitions that are inter-related. Our approach utilizes Bayesian networks for specifying the systems of clusters and what information each captures. We show that this construction provides insight about bottleneck variations and enables us to characterize solutions of these variations. We also present a general framework for iterative algorithms for constructing solutions, and apply it to several examples.
Learning the Dimensionality of Hidden Variables
A serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. Detecting hidden variables poses two problems: determining the relations to other variables in the model and determining the number of states of the hidden variable. In this paper, we address the latter problem in the context of Bayesian networks. We describe an approach that utilizes a score-based agglomerative state-clustering. As we show, this approach allows us to efficiently evaluate models with a range of cardinalities for the hidden variable. We show how to extend this procedure to deal with multiple interacting hidden variables. We demonstrate the effectiveness of this approach by evaluating it on synthetic and real-life data. We show that our approach learns models with hidden variables that generalize better and have better structure than previous approaches.