Bayesian Learning
Inference Using Message Propagation and Topology Transformation in Vector Gaussian Continuous Networks
Alag, Satnam, Agogino, Alice M.
We extend Gaussian networks - directed acyclic graphs that encode probabilistic relationships between variables - to its vector form. Vector Gaussian continuous networks consist of composite nodes representing multivariates, that take continuous values. These vector or composite nodes can represent correlations between parents, as opposed to conventional univariate nodes. We derive rules for inference in these networks based on two methods: message propagation and topology transformation. These two approaches lead to the development of algorithms, that can be implemented in either a centralized or a decentralized manner. The domain of application of these networks are monitoring and estimation problems. This new representation along with the rules for inference developed here can be used to derive current Bayesian algorithms such as the Kalman filter, and provide a rich foundation to develop new algorithms. We illustrate this process by deriving the decentralized form of the Kalman filter. This work unifies concepts from artificial intelligence and modern control theory.
An Algorithm for Finding Minimum d-Separating Sets in Belief Networks
Acid, Silvia, de Campos, Luis M.
The criterion commonly used in directed acyclic graphs (dags) for testing graphical independence is the well-known d-separation criterion. It allows us to build graphical representations of dependency models (usually probabilistic dependency models) in the form of belief networks, which make easy interpretation and management of independence relationships possible, without reference to numerical parameters (conditional probabilities). In this paper, we study the following combinatorial problem: finding the minimum d-separating set for two nodes in a dag. This set would represent the minimum information (in the sense of minimum number of variables) necessary to prevent these two nodes from influencing each other. The solution to this basic problem and some of its extensions can be useful in several ways, as we shall see later. Our solution is based on a two-step process: first, we reduce the original problem to the simpler one of finding a minimum separating set in an undirected graph, and second, we develop an algorithm for solving it.
Computational Complexity Reduction for BN2O Networks Using Similarity of States
Kozlov, Alexander V., Singh, Jaswinder Pal
Although probabilistic inference in a general Bayesian belief network is an NP-hard problem, computation time for inference can be reduced in most practical cases by exploiting domain knowledge and by making approximations in the knowledge representation. In this paper we introduce the property of similarity of states and a new method for approximate knowledge representation and inference which is based on this property. We define two or more states of a node to be similar when the ratio of their probabilities, the likelihood ratio, does not depend on the instantiations of the other nodes in the network. We show that the similarity of states exposes redundancies in the joint probability distribution which can be exploited to reduce the computation time of probabilistic inference in networks with multiple similar states, and that the computational complexity in the networks with exponentially many similar states might be polynomial. We demonstrate our ideas on the example of a BN2O network -- a two layer network often used in diagnostic problems -- by reducing it to a very close network with multiple similar states. We show that the answers to practical queries converge very fast to the answers obtained with the original network. The maximum error is as low as 5% for models that require only 10% of the computation time needed by the original BN2O model.
Efficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models
Flynn, Cheryl J., Hurvich, Clifford M., Simonoff, Jeffrey S.
It has been shown that AIC-type criteria are asymptotically efficient selectors of the tuning parameter in non-concave penalized regression methods under the assumption that the population variance is known or that a consistent estimator is available. We relax this assumption to prove that AIC itself is asymptotically efficient and we study its performance in finite samples. In classical regression, it is known that AIC tends to select overly complex models when the dimension of the maximum candidate model is large relative to the sample size. Simulation studies suggest that AIC suffers from the same shortcomings when used in penalized regression. We therefore propose the use of the classical corrected AIC (AICc) as an alternative and prove that it maintains the desired asymptotic properties. To broaden our results, we further prove the efficiency of AIC for penalized likelihood methods in the context of generalized linear models with no dispersion parameter. Similar results exist in the literature but only for a restricted set of candidate models. By employing results from the classical literature on maximum-likelihood estimation in misspecified models, we are able to establish this result for a general set of candidate models. We use simulations to assess the performance of AIC and AICc, as well as that of other selectors, in finite samples for both SCAD-penalized and Lasso regressions and a real data example is considered.
Support and Plausibility Degrees in Generalized Functional Models
By discussing several examples, the theory of generalized functional models is shown to be very natural for modeling some situations of reasoning under uncertainty. A generalized functional model is a pair (f, P) where f is a function describing the interactions between a parameter variable, an observation variable and a random source, and P is a probability distribution for the random source. Unlike traditional functional models, generalized functional models do not require that there is only one value of the parameter variable that is compatible with an observation and a realization of the random source. As a consequence, the results of the analysis of a generalized functional model are not expressed in terms of probability distributions but rather by support and plausibility functions. The analysis of a generalized functional model is very logical and is inspired from ideas already put forward by R.A. Fisher in his theory of fiducial probability.
Models and Selection Criteria for Regression and Classification
Heckerman, David, Meek, Christopher
When performing regression or classification, we are interested in the conditional probability distribution for an outcome or class variable Y given a set of explanatoryor input variables X. We consider Bayesian models for this task. In particular, we examine a special class of models, which we call Bayesian regression/classification (BRC) models, that can be factored into independent conditional (y|x) and input (x) models. These models are convenient, because the conditional model (the portion of the full model that we care about) can be analyzed by itself. We examine the practice of transforming arbitrary Bayesian models to BRC models, and argue that this practice is often inappropriate because it ignores prior knowledge that may be important for learning. In addition, we examine Bayesian methods for learning models from data. We discuss two criteria for Bayesian model selection that are appropriate for repression/classification: one described by Spiegelhalter et al. (1993), and another by Buntine (1993). We contrast these two criteria using the prequential framework of Dawid (1984), and give sufficient conditions under which the criteria agree.
Update Rules for Parameter Estimation in Bayesian Networks
Bauer, Eric, Koller, Daphne, Singer, Yoram
This paper re-examines the problem of parameter estimation in Bayesian networks with missing values and hidden variables from the perspective of recent work in on-line learning [Kivinen & Warmuth, 1994]. We provide a unified framework for parameter estimation that encompasses both on-line learning, where the model is continuously adapted to new data cases as they arrive, and the more traditional batch learning, where a pre-accumulated set of samples is used in a one-time model selection process. In the batch case, our framework encompasses both the gradient projection algorithm and the EM algorithm for Bayesian networks. The framework also leads to new on-line and batch parameter update schemes, including a parameterized version of EM. We provide both empirical and theoretical results indicating that parameterized EM allows faster convergence to the maximum likelihood parameters than does standard EM.
A Generalized Fellegi-Sunter Framework for Multiple Record Linkage With Application to Homicide Record Systems
Sadinle, Mauricio, Fienberg, Stephen E.
Mauricio Sadinle is a Ph.D. student, Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213 (email: msadinle@stat.cmu.edu); and Stephen E. Fienberg is Maurice Falk University Professor of Statistics and Social Science in the Department of Statistics, the Machine Learning Department, and the Heinz College, Carnegie Mellon University (email: fien-berg@stat.cmu.edu). This research was partially supported by NSF Grants BCS-0941518 and SES-1130706 to Carnegie Mellon University, and by the Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative and administered by the IDM Programme Office. The authors thank Rob Hall, Kristian Lum, Michael Larsen, the Associate Editor and two referees for helpful comments and suggestions on earlier versions of this paper, and Jorge A. Restrepo for providing the Colombian homicide data. An early version of this paper was written by the first author when he was affiliated to the Conflict Analysis Resource Center (CERAC) and the National University of Colombia at Bogot a. Abstract We present a probabilistic method for linking multiple datafiles. This task is not trivial in the absence of unique identifiers for the individuals recorded. This is a common scenario when linking census data to coverage measurement surveys for census coverage evaluation, and in general when multiple record-systems need to be integrated for posterior analysis. The goal of multiple record linkage is to classify the recordK -tuples coming fromK datafiles according to the different matching patterns. We use a mixture model to fit matching probabilities via maximum likelihood using the EM algorithm. We present a method to decide the recordK -tuples membership to the subsets of matching patterns and we prove its optimality. We apply our method to the integration of the three Colombian homicide record systems and perform a simulation study to explore the performance of the method under measurement error and different scenarios. The proposed method works well and opens new directions for future research. Key words and phrases: Bell number; Census undercount; Data linkage; Data matching; EM algorithm; Mixture model; Multiple systems estimation; Partially ordered set. 1 INTRODUCTION Record linkage is a widely-used technique for identifying records that refer to the same individual across different datafiles. This task is not trivial when unique identifiers are not available, and many authors have proposed probabilistic methods to deal with this problem building upon the seminal work of Newcombe et al. (1959) and Fellegi and Sunter (1969).
Independence of Causal Influence and Clique Tree Propagation
This paper explores the role of independence of causal influence (ICI) in Bayesian network inference. ICI allows one to factorize a conditional probability table into smaller pieces. We describe a method for exploiting the factorization in clique tree propagation (CTP) - the state-of-the-art exact inference algorithm for Bayesian networks. We also present empirical results showing that the resulting algorithm is significantly more efficient than the combination of CTP and previous techniques for exploiting ICI.
Score and Information for Recursive Exponential Models with Incomplete Data
Recursive graphical models usually underlie the statistical modelling concerning probabilistic expert systems based on Bayesian networks. This paper defines a version of these models, denoted as recursive exponential models, which have evolved by the desire to impose sophisticated domain knowledge onto local fragments of a model. Besides the structural knowledge, as specified by a given model, the statistical modelling may also include expert opinion about the values of parameters in the model. It is shown how to translate imprecise expert knowledge into approximately conjugate prior distributions. Based on possibly incomplete data, the score and the observed information are derived for these models. This accounts for both the traditional score and observed information, derived as derivatives of the log-likelihood, and the posterior score and observed information, derived as derivatives of the log-posterior distribution. Throughout the paper the specialization into recursive graphical models is accounted for by a simple example.