When eliciting probability models from experts, knowledge engineers may compare the results of the model with expert judgment on test scenarios, then adjust model parameters to bring the behavior of the model more in line with the expert's intuition. This paper presents a methodology for analytic computation of sensitivity values to measure the impact of small changes in a network parameter on a target probability value or distribution. These values can be used to guide knowledge elicitation. They can also be used in a gradient descent algorithm to estimate parameter values that maximize a measure of goodness-of-fit to both local and holistic probability assessments.
Within the Kolmogorov theory of probability, Bayes' rule allows one to perform statistical inference by relating conditional probabilities to unconditional probabilities. As we show here, however, there is a continuous set of alternative inference rules that yield the same results, and that may have computational or practical advantages for certain problems. We formulate generalized axioms for probability theory, according to which the reverse conditional probability distribution P(B|A) is not specified by the forward conditional probability distribution P(A|B) and the marginals P(A) and P(B). Thus, in order to perform statistical inference, one must specify an additional "inference axiom," which relates P(B|A) to P(A|B), P(A), and P(B). We show that when Bayes' rule is chosen as the inference axiom, the axioms are equivalent to the classical Kolmogorov axioms. We then derive consistency conditions on the inference axiom, and thereby characterize the set of all possible rules for inference. The set of "first-order" inference axioms, defined as the set of axioms in which P(B|A) depends on the first power of P(A|B), is found to be a 1-simplex, with Bayes' rule at one of the extreme points. The other extreme point, the "inversion rule," is studied in depth.
We describe a Groebner basis of relations among conditional probabilities in a discrete probability space, with any set of conditioned-upon events. They may be specialized to the partially-observed random variable case, the purely conditional case, and other special cases. We also investigate the connection to generalized permutohedra and describe a conditional probability simplex.
Suppose we are given the conditional probability of one variable given some other variables.Normally the full joint distribution over the conditioning variablesis required to determine the probability of the conditioned variable.Under what circumstances are the marginal distributions over the conditioning variables sufficient to determine the probability ofthe conditioned variable?Sufficiency in this sense is equivalent to additive separability ofthe conditional probability distribution.Such separability structure is natural and can be exploited forefficient inference.Separability has a natural generalization to conditional separability.Separability provides a precise notion of weaklyinteracting subsystems in temporal probabilistic models.Given a system that is decomposed into separable subsystems, exactmarginal probabilities over subsystems at future points in time can becomputed by propagating marginal subsystem probabilities, rather thancomplete system joint probabilities.Thus, separability can make exact prediction tractable.However, observations can break separability,so exact monitoring of dynamic systems remains hard.
Machine learning provides algorithms that can learn from data and make inferences or predictions on data. Bayesian networks are a class of graphical models that allow to represent a collection of random variables and their condititional dependencies by directed acyclic graphs. In this paper, an inference algorithm for the hidden random variables of a Bayesian network is given by using the tropicalization of the marginal distribution of the observed variables. By restricting the topological structure to graded networks, an inference algorithm for graded Bayesian networks will be established that evaluates the hidden random variables rank by rank and in this way yields the most probable states of the hidden variables. This algorithm can be viewed as a generalized version of the Viterbi algorithm for graded Bayesian networks.