Goto

Collaborating Authors

 Bayesian Inference


A direct method for estimating a causal ordering in a linear non-Gaussian acyclic model

arXiv.org Machine Learning

Structural equation models and Bayesian networks have been widely used to analyze causal relations between continuous variables. In such frameworks, linear acyclic models are typically used to model the datagenerating process of variables. Recently, it was shown that use of non-Gaussianity identifies a causal ordering of variables in a linear acyclic model without using any prior knowledge on the network structure, which is not the case with conventional methods. However, existing estimation methods are based on iterative search algorithms and may not converge to a correct solution in a finite number of steps. In this paper, we propose a new direct method to estimate a causal ordering based on non-Gaussianity. In contrast to the previous methods, our algorithm requires no algorithmic parameters and is guaranteed to converge to the right solution within a small fixed number of steps if the data strictly follows the model.


Learning Graphical Models With Hubs

arXiv.org Machine Learning

We consider the problem of learning a high-dimensional graphical model in which certain hub nodes are highly-connected to many other nodes. Many authors have studied the use of an l1 penalty in order to learn a sparse graph in high-dimensional setting. However, the l1 penalty implicitly assumes that each edge is equally likely and independent of all other edges. We propose a general framework to accommodate more realistic networks with hub nodes, using a convex formulation that involves a row-column overlap norm penalty. We apply this general framework to three widely-used probabilistic graphical models: the Gaussian graphical model, the covariance graph model, and the binary Ising model. An alternating direction method of multipliers algorithm is used to solve the corresponding convex optimization problems. On synthetic data, we demonstrate that our proposed framework outperforms competitors that do not explicitly model hub nodes. We illustrate our proposal on a webpage data set and a gene expression data set.


Scoring and Searching over Bayesian Networks with Causal and Associative Priors

arXiv.org Artificial Intelligence

A significant theoretical advantage of search-and-score methods for learning Bayesian Networks is that they can accept informative prior beliefs for each possible network, thus complementing the data. In this paper, a method is presented for assigning priors based on beliefs on the presence or absence of certain paths in the true network. Such beliefs correspond to knowledge about the possible causal and associative relations between pairs of variables. This type of knowledge naturally arises from prior experimental and observational data, among others. In addition, a novel search-operator is proposed to take advantage of such prior knowledge. Experiments show that, using path beliefs improves the learning of the skeleton, as well as the edge directions in the network.


Updating Sets of Probabilities

arXiv.org Artificial Intelligence

There are several well-known justifications for conditioning as the appropriate method for updating a single probability measure, given an observation. However, there is a significant body of work arguing for sets of probability measures, rather than single measures, as a more realistic model of uncertainty. Conditioning still makes sense in this context--we can simply condition each measure in the set individually, then combine the results--and, indeed, it seems to be the preferred updating procedure in the literature. But how justified is conditioning in this richer setting? Here we show, by considering an axiomatic account of conditioning given by van Fraassen, that the single-measure and sets-of-measures cases are very different. We show that van Fraassen's axiomatization for the former case is nowhere near sufficient for updating sets of measures. We give a considerably longer (and not as compelling) list of axioms that together force conditioning in this setting, and describe other update methods that are allowed once any of these axioms is dropped.


A Bayesian estimation approach to analyze non-Gaussian data-generating processes with latent classes

arXiv.org Machine Learning

Several methods have recently been proposed to discover a complete causal structure, that is, all the causal directions, under the assumption that disturbance variables have non-Gaussian distributions. However, the estimation results can be biased if there are "latent classes." Latent classes are unobserved discrete variables that have more than one observed child variables. Data that has been generated from different processes are mixed in the presence of latent classes. Several methods have been proposed to estimate the causal structure in the presence of latent classes [12], but all of these are affected by local optima. Therefore, in this paper, we propose a new estimation approach that can solve this problem. The structure of this paper is as follows.


Fast Bayesian Feature Selection for High Dimensional Linear Regression in Genomics via the Ising Approximation

arXiv.org Machine Learning

Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many genomic datasets. Here, we introduce a new approach -- the Bayesian Ising Approximation (BIA) -- to rapidly calculate posterior probabilities for feature relevance in L2 penalized linear regression. In the regime where the regression problem is strongly regularized by the prior, we show that computing the marginal posterior probabilities for features is equivalent to computing the magnetizations of an Ising model. Using a mean field approximation, we show it is possible to rapidly compute the feature selection path described by the posterior probabilities as a function of the L2 penalty. We present simulations and analytical results illustrating the accuracy of the BIA on some simple regression problems. Finally, we demonstrate the applicability of the BIA to high dimensional regression by analyzing a gene expression dataset with nearly 30,000 features.


Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression

arXiv.org Machine Learning

In Bayesian analysis the form of the posterior distribution is often not analytically tractable. To obtain quantities of interest under such a distribution, such as moments or marginal distributions, we typically need to use Monte Carlo methods or approximate the posterior with a more convenient distribution. A popular method of obtaining such an approximation is structured or fixed-form Variational Bayes, which works by numerically minimizing the Kullback-Leibler divergence of an approximating distribution in the exponential family to the intractable target distribution (Attias, 2000; Beal and Ghahramani, 2006; Jordan et al., 1999; Wainwright and Jordan, 2008). For certain problems, algorithms exist that can solve this optimization problem in much less time than it would take to approximate the posterior using Monte Carlo methods (see e.g.


A Logic for Reasoning about Evidence

arXiv.org Artificial Intelligence

We introduce a logic for reasoning about evidence, that essentially views evidence as a function from prior beliefs (before making an observation) to posterior beliefs (after making the observation). We provide a sound and complete axiomatization for the logic, and consider the complexity of the decision problem. Although the reasoning in the logic is mainly propositional, we allow variables representing numbers and quantification over them. This expressive power seems necessary to capture important properties of evidence.


A Game-Theoretic Analysis of Updating Sets of Probabilities

arXiv.org Artificial Intelligence

We consider how an agent should update her uncertainty when it is represented by a set P of probability distributions and the agent observes that a random variable X takes on value x, given that the agent makes decisions using the minimax criterion, perhaps the best-studied and most commonly-used criterion in the literature. We adopt a game-theoretic framework, where the agent plays against a bookie, who chooses some distribution from P. We consider two reasonable games that differ in what the bookie knows when he makes his choice. Anomalies that have been observed before, like time inconsistency, can be understood as arising because different games are being played, against bookies with different information. We characterize the important special cases in which the optimal decision rules according to the minimax criterion amount to either conditioning or simply ignoring the information. Finally, we consider the relationship between conditioning and calibration when uncertainty is described by sets of probabilities.


When Ignorance is Bliss

arXiv.org Artificial Intelligence

It is commonly-accepted wisdom that more information is better, and that information should never be ignored. Here we argue, using both a Bayesian and a non-Bayesian analysis, that in some situations you are better off ignoring information if your uncertainty is represented by a set of probability measures. These include situations in which the information is relevant for the prediction task at hand. In the non-Bayesian analysis, we show how ignoring information avoids dilation, the phenomenon that additional pieces of information sometimes lead to an increase in uncertainty. In the Bayesian analysis, we show that for small sample sizes and certain prediction tasks, the Bayesian posterior based on a noninformative prior yields worse predictions than simply ignoring the given information.