Directed Networks
A survey on independence-based Markov networks learning
Name Reference Comments KS Koller and Sahami (1996) - Not Sound - The first one of this type - Requires specifying MB size in advance GS Margaritis and Thrun (2000) - Sound in theory - Proposed to learn Bayesian network via the induction of neighbors of each variable - First proved such kind of algorithm - Works in two phases: grow and shrink IAMB and its variants Tsamardinos et al (2003) - Sound in theory - Actually variant of GS - Simple to implement - Time efficient - Very poor on data efficiency - IAMB's variants achieve better performance on data efficiency than IAMB HITON-PC/MB Aliferis et al (2003) - Not sound - Another trial to make use of the topology information to enhance data efficiency - Data efficiency comparable to IAMB - Much slower compared to IAMB Fast-IAMB Yaramakala and Margaritis (2005) - Sound in theory - No fundamental difference as compared to IAMB - Adds candidates more greedily to speed up the learning - Still poor on data efficiency performance MMPC/MB Tsamardinos et al (2006) - Not sound - The first to make use of the underling topology information - Much more data efficient compared to IAMB - Much slower compared to IAMB PCMB Peรฑa et al (2007) - Sound in theory - Data efficient by making use of topology information - Poor on time efficiency - Distinguish spouses from parents/children - Distinguish some children from parents/children IPC-MB Fu and Desmarais (2008) - Sound in theory - Most data efficient compared with previous algorithms - Much faster than PCMB on computing - Distinguish spouses from parents/children - Distinguish some children from parents/children - Best tradeoff among this family of algorithms
Nonparametric Bayes dynamic modeling of relational data
Durante, Daniele, Dunson, David B.
Symmetric binary matrices representing relations among entities are commonly collected in many areas. Our focus is on dynamically evolving binary relational matrices, with interest being in inference on the relationship structure and prediction. We propose a nonparametric Bayesian dynamic model, which reduces dimensionality in characterizing the binary matrix through a lower-dimensional latent space representation, with the latent coordinates evolving in continuous time via Gaussian processes. By using a logistic mapping function from the probability matrix space to the latent relational space, we obtain a flexible and computational tractable formulation. Employing P\`olya-Gamma data augmentation, an efficient Gibbs sampler is developed for posterior computation, with the dimension of the latent space automatically inferred. We provide some theoretical results on flexibility of the model, and illustrate performance via simulation experiments. We also consider an application to co-movements in world financial markets.
Reasoning with Uncertainties Over Existence of Objects
Ngo, Vien Anh (University of Stuttgart) | Toussaint, Marc (University of Stuttgart)
In this paper we consider planning problems in relationalMarkov processes where objects may โappearโ or โdisap-pearโ, perhaps depending on previous actions or propertiesof other objects. For instance, problems which require to ex-plicitly generate or discover objects fall into this category. Inour formulation this requires to explicitly represent the un-certainty over the number of objects (dimensions or factors)in a dynamic Bayesian networks (DBN). Many formalisms(also existing ones) are conceivable to formulate such prob-lems. We aim at a formulation that facilitates inference andplanning. Based on a specific formulation we investigate twoinference methodsโrejection sampling and reversible-jumpMCMCโto compute a posterior over the process conditionedon the first and last time slice (start and goal state). We willdiscuss properties, efficiency, and appropriateness of eachone.
Flexible sampling of discrete data correlations without the marginal distributions
Kalaitzis, Alfredo, Silva, Ricardo
Learning the joint dependence of discrete variables is a fundamental problem in machine learning, with many applications including prediction, clustering and dimensionality reduction. More recently, the framework of copula modeling has gained popularity due to its modular parameterization of joint distributions. Among other properties, copulas provide a recipe for combining flexible models for univariate marginal distributions with parametric families suitable for potentially high dimensional dependence structures. More radically, the extended rank likelihood approach of Hoff (2007) bypasses learning marginal models completely when such information is ancillary to the learning task at hand as in, e.g., standard dimensionality reduction problems or copula parameter estimation. The main idea is to represent data by their observable rank statistics, ignoring any other information from the marginals. Inference is typically done in a Bayesian framework with Gaussian copulas, and it is complicated by the fact this implies sampling within a space where the number of constraints increases quadratically with the number of data points. The result is slow mixing when using off-the-shelf Gibbs sampling. We present an efficient algorithm based on recent advances on constrained Hamiltonian Markov chain Monte Carlo that is simple to implement and does not require paying for a quadratic cost in sample size.
On Estimating Many Means, Selection Bias, and the Bootstrap
With recent advances in high throughput technology, researchers often find themselves running a large number of hypothesis tests (thousands+) and esti- mating a large number of effect-sizes. Generally there is particular interest in those effects estimated to be most extreme. Unfortunately naive estimates of these effect-sizes (even after potentially accounting for multiplicity in a testing procedure) can be severely biased. In this manuscript we explore this bias from a frequentist perspective: we give a formal definition, and show that an oracle estimator using this bias dominates the naive maximum likelihood estimate. We give a resampling estimator to approximate this oracle, and show that it works well on simulated data. We also connect this to ideas in empirical Bayes.
High-dimensional learning of linear causal networks via inverse covariance estimation
Loh, Po-Ling, Bรผhlmann, Peter
We establish a new framework for statistical estimation of directed acyclic graphs (DAGs) when data are generated from a linear, possibly non-Gaussian structural equation model. Our framework consists of two parts: (1) inferring the moralized graph from the support of the inverse covariance matrix; and (2) selecting the best-scoring graph amongst DAGs that are consistent with the moralized graph. We show that when the error variances are known or estimated to close enough precision, the true DAG is the unique minimizer of the score computed using the reweighted squared l_2-loss. Our population-level results have implications for the identifiability of linear SEMs when the error covariances are specified up to a constant multiple. On the statistical side, we establish rigorous conditions for high-dimensional consistency of our two-part algorithm, defined in terms of a "gap" between the true DAG and the next best candidate. Finally, we demonstrate that dynamic programming may be used to select the optimal DAG in linear time when the treewidth of the moralized graph is bounded.
Uniform random generation of large acyclic digraphs
Directed acyclic graphs are the basic representation of the structure underlying Bayesian networks, which represent multivariate probability distributions. In many practical applications, such as the reverse engineering of gene regulatory networks, not only the estimation of model parameters but the reconstruction of the structure itself is of great interest. As well as for the assessment of different structure learning algorithms in simulation studies, a uniform sample from the space of directed acyclic graphs is required to evaluate the prevalence of certain structural features. Here we analyse how to sample acyclic digraphs uniformly at random through recursive enumeration, an approach previously thought too computationally involved. Based on complexity considerations, we discuss in particular how the enumeration directly provides an exact method, which avoids the convergence issues of the alternative Markov chain methods and is actually computationally much faster. The limiting behaviour of the distribution of acyclic digraphs then allows us to sample arbitrarily large graphs. Building on the ideas of recursive enumeration based sampling we also introduce a novel hybrid Markov chain with much faster convergence than current alternatives while still being easy to adapt to various restrictions. Finally we discuss how to include such restrictions in the combinatorial enumeration and the new hybrid Markov chain method for efficient uniform sampling of the corresponding graphs.
Informed Source Separation: A Bayesian Tutorial
ABSTRACT Source separation problems are ubiquitous in the physical sciences; any situation where signals are superimposed calls for source separation to estimate the original signals. In this tutorial I will discuss the Bayesian approach to the source separation problem. This approach has a specific advantage in that it requires the designer to explicitly describe the signal model in addition to any other information or assumptions that go into the problem description. This leads naturally to the idea of informed source separation, where the algorithm design incorporates relevant information about the specific problem. This approach promises to enable researchers to design their own high-quality algorithms that are specifically tailored to the problem at hand. 1. UNDERSTANDING THE PROBLEM To gather information about the physical world, we deploy sensors to make measurements and detect signals. Our sensors, if properly designed, will collect information about the signals of interest. However, very often the signals of interest are comprised of a set of discrete signals, which have been superimposed during propagation, often with signals that are not of interest. Thus our sensors almost invariably detect a mixture of signals--some interesting and some noninteresting.
Moments and Root-Mean-Square Error of the Bayesian MMSE Estimator of Classification Error in the Gaussian Model
Zollanvari, Amin, Dougherty, Edward R.
The most important aspect of any classifier is its error rate, because this quantifies its predictive capacity. Thus, the accuracy of error estimation is critical. Error estimation is problematic in small-sample classifier design because the error must be estimated using the same data from which the classifier has been designed. Use of prior knowledge, in the form of a prior distribution on an uncertainty class of feature-label distributions to which the true, but unknown, feature-distribution belongs, can facilitate accurate error estimation (in the mean-square sense) in circumstances where accurate completely model-free error estimation is impossible. This paper provides analytic asymptotically exact finite-sample approximations for various performance metrics of the resulting Bayesian Minimum Mean-Square-Error (MMSE) error estimator in the case of linear discriminant analysis (LDA) in the multivariate Gaussian model. These performance metrics include the first, second, and cross moments of the Bayesian MMSE error estimator with the true error of LDA, and therefore, the Root-Mean-Square (RMS) error of the estimator. We lay down the theoretical groundwork for Kolmogorov double-asymptotics in a Bayesian setting, which enables us to derive asymptotic expressions of the desired performance metrics. From these we produce analytic finite-sample approximations and demonstrate their accuracy via numerical examples. Various examples illustrate the behavior of these approximations and their use in determining the necessary sample size to achieve a desired RMS. The Supplementary Material contains derivations for some equations and added figures.
Exploiting correlation and budget constraints in Bayesian multi-armed bandit optimization
Hoffman, Matthew W., Shahriari, Bobak, de Freitas, Nando
We address the problem of finding the maximizer of a nonlinear smooth function, that can only be evaluated point-wise, subject to constraints on the number of permitted function evaluations. This problem is also known as fixed-budget best arm identification in the multi-armed bandit literature. We introduce a Bayesian approach for this problem and show that it empirically outperforms both the existing frequentist counterpart and other Bayesian optimization methods. The Bayesian approach places emphasis on detailed modelling, including the modelling of correlations among the arms. As a result, it can perform well in situations where the number of arms is much larger than the number of allowed function evaluation, whereas the frequentist counterpart is inapplicable. This feature enables us to develop and deploy practical applications, such as automatic machine learning toolboxes. The paper presents comprehensive comparisons of the proposed approach, Thompson sampling, classical Bayesian optimization techniques, more recent Bayesian bandit approaches, and state-of-the-art best arm identification methods. This is the first comparison of many of these methods in the literature and allows us to examine the relative merits of their different features.