Bayesian Inference
A new axiomatization for likelihood gambles
This paper studies a new and more general axiomatization than one presented previously for preference on likelihood gambles. Likelihood gambles describe actions in a situation where a decision maker knows multiple probabilistic models and a random sample generated from one of those models but does not know prior probability of models. This new axiom system is inspired by Jensen's axiomatization of probabilistic gambles. Our approach provides a new perspective to the role of data in decision making under ambiguity. It avoids one of the most controversial issue of Bayesian methodology namely the assumption of prior probability.
Pearl's Calculus of Intervention Is Complete
This paper is concerned with graphical criteria that can be used to solve the problem of identifying casual effects from nonexperimental data in a causal Bayesian network structure, i.e., a directed acyclic graph that represents causal relationships. We first review Pearl's work on this topic [Pearl, 1995], in which several useful graphical criteria are presented. Then we present a complete algorithm [Huang and Valtorta, 2006b] for the identifiability problem. By exploiting the completeness of this algorithm, we prove that the three basic do-calculus rules that Pearl presents are complete, in the sense that, if a causal effect is identifiable, there exists a sequence of applications of the rules of the do-calculus that transforms the causal effect formula into a formula that only includes observational quantities.
The AI&M Procedure for Learning from Incomplete Data
We investigate methods for parameter learning from incomplete data that is not missing at random. Likelihood-based methods then require the optimization of a profile likelihood that takes all possible missingness mechanisms into account. Optimzing this profile likelihood poses two main difficulties: multiple (local) maxima, and its very high-dimensional parameter space. In this paper a new method is presented for optimizing the profile likelihood that addresses the second difficulty: in the proposed AI&M (adjusting imputation and mazimization) procedure the optimization is performed by operations in the space of data completions, rather than directly in the parameter space of the profile likelihood. We apply the AI&M method to learning parameters for Bayesian networks. The method is compared against conservative inference, which takes into account each possible data completion, and against EM. The results indicate that likelihood-based inference is still feasible in the case of unknown missingness mechanisms, and that conservative inference is unnecessarily weak. On the other hand, our results also provide evidence that the EM algorithm is still quite effective when the data is not missing at random.
Inequality Constraints in Causal Models with Hidden Variables
We present a class of inequality constraints on the set of distributions induced by local interventions on variables governed by a causal Bayesian network, in which some of the variables remain unmeasured. We derive bounds on causal effects that are not directly measured in randomized experiments. We derive instrumental inequality type of constraints on nonexperimental distributions. The results have applications in testing causal models with observational or experimental data.
Linear Algebra Approach to Separable Bayesian Networks
Separable Bayesian Networks, or the Influence Model, are dynamic Bayesian Networks in which the conditional probability distribution can be separated into a function of only the marginal distribution of a node's neighbors, instead of the joint distributions. In terms of modeling, separable networks has rendered possible siginificant reduction in complexity, as the state space is only linear in the number of variables on the network, in contrast to a typical state space which is exponential. In this work, We describe the connection between an arbitrary Conditional Probability Table (CPT) and separable systems using linear algebra. We give an alternate proof on the equivalence of sufficiency and separability. We present a computational method for testing whether a given CPT is separable.
Cutset Sampling with Likelihood Weighting
Bidyuk, Bozhena, Dechter, Rina
The paper analyzes theoretically and empirically the performance of likelihood weighting (LW) on a subset of nodes in Bayesian networks. The proposed scheme requires fewer samples to converge due to reduction in sampling variance. The method exploits the structure of the network to bound the complexity of exact inference used to compute sampling distributions, similar to Gibbs cutset sampling. Yet, the extension of the previosly proposed cutset sampling principles to likelihood weighting is non-trivial due to differences in the sampling processes of Gibbs sampler and LW. We demonstrate empirically that likelihood weighting on a cutset (LWLC) is effective time-wise and has a lower rejection rate than LW when applied to networks with many deterministic probabilities. Finally, we show that the performance of likelihood weighting on a cutset can be improved further by caching computed sampling distributions and, consequently, learning 'zeros' of the target distribution.
On the Robustness of Most Probable Explanations
In Bayesian networks, a Most Probable Explanation (MPE) is a complete variable instantiation with a highest probability given the current evidence. In this paper, we discuss the problem of finding robustness conditions of the MPE under single parameter changes. Specifically, we ask the question: How much change in a single network parameter can we afford to apply while keeping the MPE unchanged? We will describe a procedure, which is the first of its kind, that computes this answer for each parameter in the Bayesian network variable in time O(n exp(w)), where n is the number of network variables and w is its treewidth.
MAIES: A Tool for DNA Mixture Analysis
Cowell, Robert G., Lauritzen, Steffen L., Mortera, Julia
We describe an expert system, Maies, under development for analysing forensic identification problems involving DNA mixture traces using quantitative peak area information. Peak area information is represented by conditional Gaussian distributions, and inference based on exact junction tree propagation ascertains whether individuals, whose profiles have been measured, have contributed to the mixture. The system can also be used to predict DNA profiles of unknown contributors by separating the mixture into its individual components. The use of the system is illustrated with an application to a real world example. The system implements a novel MAP (maximum a posteriori) search algorithm that is briefly described.
An Empirical Comparison of Algorithms for Aggregating Expert Predictions
Dani, Varsha, Madani, Omid, Pennock, David M, Sanghai, Sumit, Galebach, Brian
Predicting the outcomes of future events is a challenging problem for which a variety of solution methods have been explored and attempted. We present an empirical comparison of a variety of online and offline adaptive algorithms for aggregating experts' predictions of the outcomes of five years of US National Football League games (1319 games) using expert probability elicitations obtained from an Internet contest called ProbabilitySports. We find that it is difficult to improve over simple averaging of the predictions in terms of prediction accuracy, but that there is room for improvement in quadratic loss. Somewhat surprisingly, a Bayesian estimation algorithm which estimates the variance of each expert's prediction exhibits the most consistent superior performance over simple averaging among our collection of algorithms.
Reading Dependencies from Covariance Graphs
The covariance graph (aka bi-directed graph) of a probability distribution $p$ is the undirected graph $G$ where two nodes are adjacent iff their corresponding random variables are marginally dependent in $p$. In this paper, we present a graphical criterion for reading dependencies from $G$, under the assumption that $p$ satisfies the graphoid properties as well as weak transitivity and composition. We prove that the graphical criterion is sound and complete in certain sense. We argue that our assumptions are not too restrictive. For instance, all the regular Gaussian probability distributions satisfy them.