Heckerman, David
The Role of Calculi in Uncertain Inference Systems
Wellman, Michael P., Heckerman, David
Much of the controversy about methods for automated decision making has focused on specific calculi for combining beliefs or propagating uncertainty. We broaden the debate by (1) exploring the constellation of secondary tasks surrounding any primary decision problem, and (2) identifying knowledge engineering concerns that present additional representational tradeoffs. We argue on pragmatic grounds that the attempt to support all of these tasks within a single calculus is misguided. In the process, we note several uncertain reasoning objectives that conflict with the Bayesian ideal of complete specification of probabilities and utilities. In response, we advocate treating the uncertainty calculus as an object language for reasoning mechanisms that support the secondary tasks. Arguments against Bayesian decision theory are weakened when the calculus is relegated to this role. Architectures for uncertainty handling that take statements in the calculus as objects to be reasoned about offer the prospect of retaining normative status with respect to decision making while supporting the other tasks in uncertain reasoning.
Probabilistic Interpretations for MYCIN's Certainty Factors
Heckerman, David
This paper examines the quantities used by MYCIN to reason with uncertainty, called certainty factors. It is shown that the original definition of certainty factors is inconsistent with the functions used in MYCIN to combine the quantities. This inconsistency is used to argue for a redefinition of certainty factors in terms of the intuitively appealing desiderata associated with the combining functions. It is shown that this redefinition accommodates an unlimited number of probabilistic interpretations. These interpretations are shown to be monotonic transformations of the likelihood ratio p(EIH)/p(El H). The construction of these interpretations provides insight into the assumptions implicit in the certainty factor model. In particular, it is shown that if uncertainty is to be propagated through an inference network in accordance with the desiderata, evidence must be conditionally independent given the hypothesis and its negation and the inference network must have a tree structure. It is emphasized that assumptions implicit in the model are rarely true in practical applications. Methods for relaxing the assumptions are suggested.
A Perspective on Confidence and Its Use in Focusing Attention During Knowledge Acquisition
Heckerman, David, Jimison, Holly B.
We present a representation of partial confidence in belief and preference that is consistent with the tenets of decision-theory. The fundamental insight underlying the representation is that if a person is not completely confident in a probability or utility assessment, additional modeling of the assessment may improve decisions to which it is relevant. We show how a traditional decision-analytic approach can be used to balance the benefits of additional modeling with associated costs. The approach can be used during knowledge acquisition to focus the attention of a knowledge engineer or expert on parts of a decision model that deserve additional refinement.
The Compilation of Decision Models
Heckerman, David, Breese, John S., Horvitz, Eric J.
We introduce and analyze the problem of the compilation of decision models from a decision-theoretic perspective. The techniques described allow us to evaluate various configurations of compiled knowledge given the nature of evidential relationships in a domain, the utilities associated with alternative actions, the costs of run-time delays, and the costs of memory. We describe procedures for selecting a subset of the total observations available to be incorporated into a compiled situation-action mapping, in the context of a binary decision with conditional independence of evidence. The methods allow us to incrementally select the best pieces of evidence to add to the set of compiled knowledge in an engineering setting. After presenting several approaches to compilation, we exercise one of the methods to provide insight into the relationship between the distribution over weights of evidence and the preferred degree of compilation.
An Axiomatic Framework for Belief Updates
Heckerman, David
In the 1940's, a physicist named Cox provided the first formal justification for the axioms of probability based on the subjective or Bayesian interpretation. He showed that if a measure of belief satisfies several fundamental properties, then the measure must be some monotonic transformation of a probability. In this paper, measures of change in belief or belief updates are examined. In the spirit of Cox, properties for a measure of change in belief are enumerated. It is shown that if a measure satisfies these properties, it must satisfy other restrictive conditions. For example, it is shown that belief updates in a probabilistic context must be equal to some monotonic transformation of a likelihood ratio. It is hoped that this formal explication of the belief update paradigm will facilitate critical discussion and useful extensions of the approach.
A Combination of Cutset Conditioning with Clique-Tree Propagation in the Pathfinder System
Suermondt, Jaap, Cooper, Gregory F., Heckerman, David
Cutset conditioning and clique-tree propagation are two popular methods for performing exact probabilistic inference in Bayesian belief networks. Cutset conditioning is based on decomposition of a subset of network nodes, whereas clique-tree propagation depends on aggregation of nodes. We describe a means to combine cutset conditioning and clique- tree propagation in an approach called aggregation after decomposition (AD). We discuss the application of the AD method in the Pathfinder system, a medical expert system that offers assistance with diagnosis in hematopathology.
A Tractable Inference Algorithm for Diagnosing Multiple Diseases
Heckerman, David
We examine a probabilistic model for the diagnosis of multiple diseases. In the model, diseases and findings are represented as binary variables. Also, diseases are marginally independent, features are conditionally independent given disease instances, and diseases interact to produce findings via a noisy OR-gate. An algorithm for computing the posterior probability of each disease, given a set of observed findings, called quickscore, is presented. The time complexity of the algorithm is O(nm-2m+), where n is the number of diseases, m+ is the number of positive findings and m- is the number of negative findings. Although the time complexity of quickscore i5 exponential in the number of positive findings, the algorithm is useful in practice because the number of observed positive findings is usually far less than the number of diseases under consideration. Performance results for quickscore applied to a probabilistic version of Quick Medical Reference (QMR) are provided.
The Myth of Modularity in Rule-Based Systems
Heckerman, David, Horvitz, Eric J.
In this paper, we examine the concept of modularity, an often cited advantage of the ruled-based representation methodology. We argue that the notion of modularity consists of two distinct concepts which we call syntactic modularity and semantic modularity. We argue that when reasoning under certainty, it is reasonable to regard the rule-based approach as both syntactically and semantically modular. However, we argue that in the case of plausible reasoning, rules are syntactically modular but are rarely semantically modular. To illustrate this point, we examine a particular approach for managing uncertainty in rule-based systems called the MYCIN certainty factor model. We formally define the concept of semantic modularity with respect to the certainty factor model and discuss logical consequences of the definition. We show that the assumption of semantic modularity imposes strong restrictions on rules in a knowledge base. We argue that such restrictions are rarely valid in practical applications. Finally, we suggest how the concept of semantic modularity can be relaxed in a manner that makes it appropriate for plausible reasoning.
Learning Gaussian Networks
Geiger, Dan, Heckerman, David
We describe algorithms for learning Bayesian networks from a combination of user knowledge and statistical data. The algorithms have two components: a scoring metric and a search procedure. The scoring metric takes a network structure, statistical data, and a user's prior knowledge, and returns a score proportional to the posterior probability of the network structure given the data. The search procedure generates networks for evaluation by the scoring metric. Previous work has concentrated on metrics for domains containing only discrete variables, under the assumption that data represents a multinomial sample. In this paper, we extend this work, developing scoring metrics for domains containing all continuous variables or a mixture of discrete and continuous variables, under the assumption that continuous data is sampled from a multivariate normal distribution. Our work extends traditional statistical approaches for identifying vanishing regression coefficients in that we identify two important assumptions, called event equivalence and parameter modularity, that when combined allow the construction of prior distributions for multivariate normal parameters from a single prior Bayesian network specified by a user.
Models and Selection Criteria for Regression and Classification
Heckerman, David, Meek, Christopher
When performing regression or classification, we are interested in the conditional probability distribution for an outcome or class variable Y given a set of explanatoryor input variables X. We consider Bayesian models for this task. In particular, we examine a special class of models, which we call Bayesian regression/classification (BRC) models, that can be factored into independent conditional (y|x) and input (x) models. These models are convenient, because the conditional model (the portion of the full model that we care about) can be analyzed by itself. We examine the practice of transforming arbitrary Bayesian models to BRC models, and argue that this practice is often inappropriate because it ignores prior knowledge that may be important for learning. In addition, we examine Bayesian methods for learning models from data. We discuss two criteria for Bayesian model selection that are appropriate for repression/classification: one described by Spiegelhalter et al. (1993), and another by Buntine (1993). We contrast these two criteria using the prequential framework of Dawid (1984), and give sufficient conditions under which the criteria agree.