Bayesian Learning
Stochastic Simulation of Bayesian Belief Networks
Chin, Homer L., Cooper, Gregory F.
This paper examines Bayesian belief network inference using simulation as a method for computing the posterior probabilities of network variables. Specifically, it examines the use of a method described by Henrion, called logic sampling, and a method described by Pearl, called stochastic simulation. We first review the conditions under which logic sampling is computationally infeasible. Such cases motivated the development of the Pearl's stochastic simulation algorithm. We have found that this stochastic simulation algorithm, when applied to certain networks, leads to much slower than expected convergence to the true posterior probabilities. This behavior is a result of the tendency for local areas in the network to become fixed through many simulation cycles. The time required to obtain significant convergence can be made arbitrarily long by strengthening the probabilistic dependency between nodes. We propose the use of several forms of graph modification, such as graph pruning, arc reversal, and node reduction, in order to convert some networks into formats that are computationally more efficient for simulation.
Bayesian Inference in Model-Based Machine Vision
Binford, Thomas O., Levitt, Tod S., Mann, Wallace B.
Advanced Decision Systems Abstract We present a thorough integration of hierarchical Bayesian inference with comprehensive physical representation of objects and their relations in a system for reasoning with geometry in machine vision. Bayesian inference provides a framework (or accruing probabilities to rank order hypotheses. This is a preliminary version of visual interpretation in SUCCESSOR, an intelligent, model-based vision system integrating multiple sensors. Introduction Our design for machine vision uses an evidential accrual process, a. beginning from representation and database of a priori models o(physical objects and their photometric, geometric, and functional properties, together with their relationships and environment, b. predicting observables using models of sensors and perceptual measurement processes; c. making measurements of corresponding observables, measuring image evidence for features of objects and structures of features such as edges, vertices and regions; d. generating hypotheses of instances o(objects from those measurements and predictions; and In range imagery, measurements are 3d. There is still a difficult stage of segmenting and estimating 3d relations that disclose object structure. In 2d images, there is an additional inference from 2d projected image evidence to 3d interpretation of surfaces. System structure tends to break up into a natural hierarchy of representation and processing [Binford 80).
Bayesian Prediction for Artificial Intelligence
Self, Matthew, Cheeseman, Peter
This paper shows that the common method used for making predictions under uncertainty in A1 and science is in error. This method is to use currently available data to select the best model from a given class of models-this process is called abduction-and then to use this model to make predictions about future data. The correct method requires averaging over all the models to make a prediction-we call this method transduction. Using transduction, an AI system will not give misleading results when basing predictions on small amounts of data, when no model is clearly best. For common classes of models we show that the optimal solution can be given in closed form.
Belief in Belief Functions: An Examination of Shafer's Canonical Examples
EXAMINATION OF SHAFER'S CANONICAL EXAMPLES Kathryn Blackmond Laskey Decision Science Consortium, Inc. 7700 Leesburg Pike, Suite 421 Falls Church, VA 22043 1 Abstract In the canonical examples underlying Shafer-Dempster theory, beliefs over the hypotheses of interest are derived from a probability model for a set of auxiliary hypotheses. Beliefs are derived via a compatibility relation connecting the auxiliary hypotheses to subsets of the primary hypotheses. A belief function differs from a Bayesian probability model in that one does not condition on those parts of the evidence for which no probabilities are specified. The significance of this difference in conditioning assumptions is illustrated with two examples giving rise to identical belief functions but different Bayesian probability distributions. Introduction The artificial intelligence community is in the midst of a lively debate over the representation and manipulation of uncertainty.
Is Shafer General Bayes?
This paper examines the relationship between Shafer's belief functions and convex sets of probability distributions. Kyburg's (1986) result showed that belief function models form a subset of the class of closed convex probability distributions. This paper emphasizes the importance of Kyburg's result by looking at simple examples involving Bernoulli trials. Furthermore, it is shown that many convex sets of probability distributions generate the same belief function in the sense that they support the same lower and upper values. This has implications for a decision theoretic extension. Dempster's rule of combination is also compared with Bayes' rule of conditioning.
Network Detection Theory and Performance
Smith, Steven T., Senne, Kenneth D., Philips, Scott, Kao, Edward K., Bernstein, Garrett
Network detection is an important capability in many areas of applied research in which data can be represented as a graph of entities and relationships. Oftentimes the object of interest is a relatively small subgraph in an enormous, potentially uninteresting background. This aspect characterizes network detection as a "big data" problem. Graph partitioning and network discovery have been major research areas over the last ten years, driven by interest in internet search, cyber security, social networks, and criminal or terrorist activities. The specific problem of network discovery is addressed as a special case of graph partitioning in which membership in a small subgraph of interest must be determined. Algebraic graph theory is used as the basis to analyze and compare different network detection methods. A new Bayesian network detection framework is introduced that partitions the graph based on prior information and direct observations. The new approach, called space-time threat propagation, is proved to maximize the probability of detection and is therefore optimum in the Neyman-Pearson sense. This optimality criterion is compared to spectral community detection approaches which divide the global graph into subsets or communities with optimal connectivity properties. We also explore a new generative stochastic model for covert networks and analyze using receiver operating characteristics the detection performance of both classes of optimal detection techniques.
Bayesian Compressed Regression
Guhaniyogi, Rajarshi, Dunson, David B.
As an alternative to variable selection or shrinkage in high dimensional regression, we propose to randomly compress the predictors prior to analysis. This dramatically reduces storage and computational bottlenecks, performing well when the predictors can be projected to a low dimensional linear subspace with minimal loss of information about the response. As opposed to existing Bayesian dimensionality reduction approaches, the exact posterior distribution conditional on the compressed data is available analytically, speeding up computation by many orders of magnitude while also bypassing robustness issues due to convergence and mixing problems with MCMC. Model averaging is used to reduce sensitivity to the random projection matrix, while accommodating uncertainty in the subspace dimension. Strong theoretical support is provided for the approach by showing near parametric convergence rates for the predictive density in the large p small n asymptotic paradigm. Practical performance relative to competitors is illustrated in simulations and real data applications.
Deep Gaussian Processes
Damianou, Andreas C., Lawrence, Neil D.
In this paper we introduce deep Gaussian process (GP) models. Deep GPs are a deep belief network based on Gaussian process mappings. The data is modeled as the output of a multivariate GP. The inputs to that Gaussian process are then governed by another GP. A single layer model is equivalent to a standard GP or the GP latent variable model (GP-LVM). We perform inference in the model by approximate variational marginalization. This results in a strict lower bound on the marginal likelihood of the model which we use for model selection (number of layers and nodes per layer). Deep belief networks are typically applied to relatively large data sets using stochastic gradient descent for optimization. Our fully Bayesian treatment allows for the application of deep models even when data is scarce. Model selection by our variational bound shows that a five layer hierarchy is justified even when modelling a digit data set containing only 150 examples.
ARCO1: An Application of Belief Networks to the Oil Market
Belief networks are a new, potentially important, class of knowledge-based models. ARCO1, currently under development at the Atlantic Richfield Company (ARCO) and the University of Southern California (USC), is the most advanced reported implementation of these models in a financial forecasting setting. ARCO1's underlying belief network models the variables believed to have an impact on the crude oil market. A pictorial market model-developed on a MAC II- facilitates consensus among the members of the forecasting team. The system forecasts crude oil prices via Monte Carlo analyses of the network. Several different models of the oil market have been developed; the system's ability to be updated quickly highlights its flexibility.
On the Generation of Alternative Explanations with Implications for Belief Revision
Department of Computer Science Brown University Providence, RI 02912 Abstract In general, the best explanation for a given observation makes no promises on how good it is with respect to other alternative explanations. A major deficiency of message-passing schemes for belief revision in Bayesian networks is their inability to generate alternatives beyond the second best. In this paper, we present a general approach based on linear constraint systems that naturally generates alternative explanations in an orderly and highly efficient manner. This approach is then applied to cost-based abduction problems as well as belief revision in Bayesian networks. INTRODUCTION We are constantly faced with the problem of explaining the observations we have gathered with our senses. Our explanations are constructed by assuming certain facts or hypotheses which support our observations. For example, suppose I decide to phone my friend Tony at the office.