Bayesian Learning
A New Model of Plan Recognition
Goldman, Robert P., Geib, Christopher W., Miller, Christopher A.
We present a new abductive, probabilistic theory of plan recognition. This model differs from previous plan recognition theories in being centered around a model of plan execution: most previous methods have been based on plans as formal objects or on rules describing the recognition process. We show that our new model accounts for phenomena omitted from most previous plan recognition theories: notably the cumulative effect of a sequence of observations of partially-ordered, interleaved plans and the effect of context on plan adoption. The model also supports inferences about the evolution of plan execution in situations where another agent intervenes in plan execution. This facility provides support for using plan recognition to build systems that will intelligently assist a user.
On Transformations between Probability and Spohnian Disbelief Functions
Giang, Phan H., Shenoy, Prakash P.
In this paper, we analyze the relationship between probability and Spohn's theory for representation of uncertain beliefs. Using the intuitive idea that the more probable a proposition is, the more believable it is, we study transformations from probability to Sphonian disbelief and vice-versa. The transformations described in this paper are different from those described in the literature. In particular, the former satisfies the principles of ordinal congruence while the latter does not. Such transformations between probability and Spohn's calculi can contribute to (1) a clarification of the semantics of nonprobabilistic degree of uncertain belief, and (2) to a construction of a decision theory for such calculi. In practice, the transformations will allow a meaningful combination of more than one calculus in different stages of using an expert system such as knowledge acquisition, inference, and interpretation of results.
Quantifier Elimination for Statistical Problems
Geiger, Dan, Meek, Christopher
Recent improvement on Tarski's procedure for quantifier elimination in the first order theory of real numbers makes it feasible to solve small instances of the following problems completely automatically: 1. listing all equality and inequality constraints implied by a graphical model with hidden variables. 2. Comparing graphyical models with hidden variables (i.e., model equivalence, inclusion, and overlap). 3. Answering questions about the identification of a model or portion of a model, and about bounds on quantities derived from a model. 4. Determing whether a given set of independence assertions. We discuss the foundation of quantifier elimination and demonstrate its application to these problems.
Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm
Friedman, Nir, Nachman, Iftach, Pe'er, Dana
Learning Bayesian networks is often cast as an optimization problem, where the computational task is to find a structure that maximizes a statistically motivated score. By and large, existing learning tools address this optimization problem using standard heuristic search techniques. Since the search space is extremely large, such search procedures can spend most of the time examining candidates that are extremely unreasonable. This problem becomes critical when we deal with data sets that are large either in the number of instances, or the number of attributes. In this paper, we introduce an algorithm that achieves faster learning by restricting the search space. This iterative algorithm restricts the parents of each variable to belong to a small subset of candidates. We then search for a network that satisfies these constraints. The learned network is then used for selecting better candidates for the next iteration. We evaluate this algorithm both on synthetic and real-life data. Our results show that it is significantly faster than alternative search procedures without loss of quality in the learned structures.
Data Analysis with Bayesian Networks: A Bootstrap Approach
Friedman, Nir, Goldszmidt, Moises, Wyner, Abraham
In recent years there ha-- been significant progress in algorithms and methods for inducing Bayesian networks from data. However, in complex data analysis problems, we need to go beyond being satisfied with inducing networks with high scores. We need to provide confidence measures on features of these networks: Is the existence of an edge between two nodes warranted? Is the Markov blanket of a given node robust? Can we say something about the ordering of the variables? We should be able to address these questions, even when the amount of data is not enough to induce a high scoring network. In this paper we propose Efron's Bootstrap a-- a computationally efficient approach for answering these questions. In addition, we propose to use these confidence measures to induce better structures from the data, and to detect the presence of latent variables.
A Hybrid Anytime Algorithm for the Constructiion of Causal Models From Sparse Data
Dash, Denver, Druzdzel, Marek J.
We present a hybrid constraint-based/Bayesian algorithm for learning causal networks in the presence of sparse data. The algorithm searches the space of equivalence classes of models (essential graphs) using a heuristic based on conventional constraint-based techniques. Each essential graph is then converted into a directed acyclic graph and scored using a Bayesian scoring metric. Two variants of the algorithm are developed and tested using data from randomly generated networks of sizes from 15 to 45 nodes with data sizes ranging from 250 to 2000 records. Both variations are compared to, and found to consistently outperform two variations of greedy search with restarts.
Loglinear models for first-order probabilistic reasoning
Recent work on loglinear models in probabilistic constraint logic programming is applied to first-order probabilistic reasoning. Probabilities are defined directly on the proofs of atomic formulae, and by marginalisation on the atomic formulae themselves. We use Stochastic Logic Programs (SLPs) composed of labelled and unlabelled definite clauses to define the proof probabilities. We have a conservative extension of first-order reasoning, so that, for example, there is a one-one mapping between logical and random variables. We show how, in this framework, Inductive Logic Programming (ILP) can be used to induce the features of a loglinear model from data. We also compare the presented framework with other approaches to first-order probabilistic reasoning.
Causal Discovery from a Mixture of Experimental and Observational Data
Cooper, Gregory F., Yoo, Changwon
This paper describes a Bayesian method for combining an arbitrary mixture of observational and experimental data in order to learn causal Bayesian networks. Observational data are passively observed. Experimental data, such as that produced by randomized controlled trials, result from the experimenter manipulating one or more variables (typically randomly) and observing the states of other variables. The paper presents a Bayesian method for learning the causal structure and parameters of the underlying causal process that is generating the data, given that (1) the data contains a mixture of observational and experimental case records, and (2) the causal process is modeled as a causal Bayesian network. This learning method was applied using as input various mixtures of experimental and observational data that were generated from the ALARM causal Bayesian network. In these experiments, the absolute and relative quantities of experimental and observational data were varied systematically. For each of these training datasets, the learning method was applied to predict the causal structure and to estimate the causal parameters that exist among randomly selected pairs of nodes in ALARM that are not confounded. The paper reports how these structure predictions and parameter estimates compare with the true causal structures and parameters as given by the ALARM network.
Comparing Bayesian Network Classifiers
In this paper, we empirically evaluate algorithms for learning four types of Bayesian network (BN) classifiers - Naive-Bayes, tree augmented Naive-Bayes, BN augmented Naive-Bayes and general BNs, where the latter two are learned using two variants of a conditional-independence (CI) based BN-learning algorithm. Experimental results show the obtained classifiers, learned using the CI based algorithms, are competitive with (or superior to) the best known classifiers, based on both Bayesian networks and other formalisms; and that the computational time for learning and using these classifiers is relatively small. Moreover, these results also suggest a way to learn yet more effective classifiers; we demonstrate empirically that this new algorithm does work as expected. Collectively, these results argue that BN classifiers deserve more attention in machine learning and data mining communities.
Discovering the Hidden Structure of Complex Dynamic Systems
Boyen, Xavier, Friedman, Nir, Koller, Daphne
Dynamic Bayesian networks provide a compact and natural representation for complex dynamic systems. However, in many cases, there is no expert available from whom a model can be elicited. Learning provides an alternative approach for constructing models of dynamic systems. In this paper, we address some of the crucial computational aspects of learning the structure of dynamic systems, particularly those where some relevant variables are partially observed or even entirely unknown. Our approach is based on the Structural Expectation Maximization (SEM) algorithm. The main computational cost of the SEM algorithm is the gathering of expected sufficient statistics. We propose a novel approximation scheme that allows these sufficient statistics to be computed efficiently. We also investigate the fundamental problem of discovering the existence of hidden variables without exhaustive and expensive search. Our approach is based on the observation that, in dynamic systems, ignoring a hidden variable typically results in a violation of the Markov property. Thus, our algorithm searches for such violations in the data, and introduces hidden variables to explain them. We provide empirical results showing that the algorithm is able to learn the dynamics of complex systems in a computationally tractable way.