Bayesian Inference
Community detection in multi-relational data with restricted multi-layer stochastic blockmodel
In recent years there has been an increased interest in statistical analysis of data with multiple types of relations among a set of entities. Such multi-relational data can be represented as multi-layer graphs where the set of vertices represents the entities and multiple types of edges represent the different relations among them. For community detection in multi-layer graphs, we consider two random graph models, the multi-layer stochastic blockmodel (MLSBM) and a model with a restricted parameter space, the restricted multi-layer stochastic blockmodel (RMLSBM). We derive consistency results for community assignments of the maximum likelihood estimators (MLEs) in both models where MLSBM is assumed to be the true model, and either the number of nodes or the number of types of edges or both grow. We compare MLEs in the two models with other baseline approaches, such as separate modeling of layers, aggregating the layers and majority voting. RMLSBM is shown to have advantage over MLSBM when either the growth rate of the number of communities is high or the growth rate of the average degree of the component graphs in the multi-graph is low. We also derive minimax rates of error and sharp thresholds for achieving consistency of community detection in both models, which are then used to compare the multi-layer models with a baseline model, the aggregate stochastic block model. The simulation studies and real data applications confirm the superior performance of the multi-layer approaches in comparison to the baseline procedures.
Bayesian model comparison with un-normalised likelihoods
Everitt, Richard G., Johansen, Adam M., Rowing, Ellen, Evdemon-Hogan, Melina
Models for which the likelihood function can be evaluated only up to a parameter-dependent unknown normalising constant, such as Markov random field models, are used widely in computer science, statistical physics, spatial statistics, and network analysis. However, Bayesian analysis of these models using standard Monte Carlo methods is not possible due to the intractability of their likelihood functions. Several methods that permit exact, or close to exact, simulation from the posterior distribution have recently been developed. However, estimating the evidence and Bayes' factors (BFs) for these models remains challenging in general. This paper describes new random weight importance sampling and sequential Monte Carlo methods for estimating BFs that use simulation to circumvent the evaluation of the intractable likelihood, and compares them to existing methods. In some cases we observe an advantage in the use of biased weight estimates. An initial investigation into the theoretical and empirical properties of this class of methods is presented. Some support for the use of biased estimates is presented, but we advocate caution in the use of such estimates.
Online Prediction of Dyadic Data with Heterogeneous Matrix Factorization
Chen, Guangyong, Zhu, Fengyuan, Heng, Pheng Ann
Dyadic Data Prediction (DDP) is an important problem in many research areas. This paper develops a novel fully Bayesian nonparametric framework which integrates two popular and complementary approaches, discrete mixed membership modeling and continuous latent factor modeling into a unified Heterogeneous Matrix Factorization~(HeMF) model, which can predict the unobserved dyadics accurately. The HeMF can determine the number of communities automatically and exploit the latent linear structure for each bicluster efficiently. We propose a Variational Bayesian method to estimate the parameters and missing data. We further develop a novel online learning approach for Variational inference and use it for the online learning of HeMF, which can efficiently cope with the important large-scale DDP problem. We evaluate the performance of our method on the EachMoive, MovieLens and Netflix Prize collaborative filtering datasets. The experiment shows that, our model outperforms state-of-the-art methods on all benchmarks. Compared with Stochastic Gradient Method (SGD), our online learning approach achieves significant improvement on the estimation accuracy and robustness.
A Sufficient Statistics Construction of Bayesian Nonparametric Exponential Family Conjugate Models
Conjugate pairs of distributions over infinite dimensional spaces are prominent in statistical learning theory, particularly due to the widespread adoption of Bayesian nonparametric methodologies for a host of models and applications. Much of the existing literature in the learning community focuses on processes possessing some form of computationally tractable conjugacy as is the case for the beta and gamma processes (and, via normalization, the Dirichlet process). For these processes, proofs of conjugacy and requisite derivation of explicit computational formulae for posterior density parameters are idiosyncratic to the stochastic process in question. As such, Bayesian Nonparametric models are currently available for a limited number of conjugate pairs, e.g. the Dirichlet-multinomial and beta-Bernoulli process pairs. In each of these above cases the likelihood process belongs to the class of discrete exponential family distributions. The exclusion of continuous likelihood distributions from the known cases of Bayesian Nonparametric Conjugate models stands as a disparity in the researcher's toolbox. In this paper we first address the problem of obtaining a general construction of prior distributions over infinite dimensional spaces possessing distributional properties amenable to conjugacy. Second, we bridge the divide between the discrete and continuous likelihoods by illustrating a canonical construction for stochastic processes whose Levy measure densities are from positive exponential families, and then demonstrate that these processes in fact form the prior, likelihood, and posterior in a conjugate family. Our canonical construction subsumes known computational formulae for posterior density parameters in the cases where the likelihood is from a discrete distribution belonging to an exponential family.
Nonparametric semi-supervised learning of class proportions
Jain, Shantanu, White, Martha, Trosset, Michael W., Radivojac, Predrag
The problem of developing binary classifiers from positive and unlabeled data is often encountered in machine learning. A common requirement in this setting is to approximate posterior probabilities of positive and negative classes for a previously unseen data point. This problem can be decomposed into two steps: (i) the development of accurate predictors that discriminate between positive and unlabeled data, and (ii) the accurate estimation of the prior probabilities of positive and negative examples. In this work we primarily focus on the latter subproblem. We study nonparametric class prior estimation and formulate this problem as an estimation of mixing proportions in two-component mixture models, given a sample from one of the components and another sample from the mixture itself. We show that estimation of mixing proportions is generally ill-defined and propose a canonical form to obtain identifiability while maintaining the flexibility to model any distribution. We use insights from this theory to elucidate the optimization surface of the class priors and propose an algorithm for estimating them. To address the problems of high-dimensional density estimation, we provide practical transformations to low-dimensional spaces that preserve class priors. Finally, we demonstrate the efficacy of our method on univariate and multivariate data.
Robust EM kernel-based methods for linear system identification
Bottegal, Giulio, Aravkin, Aleksandr Y., Hjalmarsson, Hรฅkan, Pillonetto, Gianluigi
Recent developments in system identification have brought attention to regularized kernel-based methods. This type of approach has been proven to compare favorably with classic parametric methods. However, current formulations are not robust with respect to outliers. In this paper, we introduce a novel method to robustify kernel-based system identification methods. To this end, we model the output measurement noise using random variables with heavy-tailed probability density functions (pdfs), focusing on the Laplacian and the Student's t distributions. Exploiting the representation of these pdfs as scale mixtures of Gaussians, we cast our system identification problem into a Gaussian process regression framework, which requires estimating a number of hyperparameters of the data size order. To overcome this difficulty, we design a new maximum a posteriori (MAP) estimator of the hyperparameters, and solve the related optimization problem with a novel iterative scheme based on the Expectation-Maximization (EM) method. In presence of outliers, tests on simulated data and on a real system show a substantial performance improvement compared to currently used kernel-based methods for linear system identification.
Streaming Gibbs Sampling for LDA Model
Gao, Yang, Chen, Jianfei, Zhu, Jun
Streaming variational Bayes (SVB) is successful in learning LDA models in an online manner. However previous attempts toward developing online Monte-Carlo methods for LDA have little success, often by having much worse perplexity than their batch counterparts. We present a streaming Gibbs sampling (SGS) method, an online extension of the collapsed Gibbs sampling (CGS). Our empirical study shows that SGS can reach similar perplexity as CGS, much better than SVB. Our distributed version of SGS, DSGS, is much more scalable than SVB mainly because the updates' communication complexity is small.
Probabilistic Programming with Gaussian Process Memoization
Schaechtle, Ulrich, Zinberg, Ben, Radul, Alexey, Stathis, Kostas, Mansinghka, Vikash K.
Gaussian Processes (GPs) are widely used tools in statistics, machine learning, robotics, computer vision, and scientific computation. However, despite their popularity, they can be difficult to apply; all but the simplest classification or regression applications require specification and inference over complex covariance functions that do not admit simple analytical posteriors. This paper shows how to embed Gaussian processes in any higher-order probabilistic programming language, using an idiom based on memoization, and demonstrates its utility by implementing and extending classic and state-of-the-art GP applications. The interface to Gaussian processes, called gpmem, takes an arbitrary real-valued computational process as input and returns a statistical emulator that automatically improve as the original process is invoked and its input-output behavior is recorded. The flexibility of gpmem is illustrated via three applications: (i) robust GP regression with hierarchical hyper-parameter learning, (ii) discovering symbolic expressions from time-series data by fully Bayesian structure learning over kernels generated by a stochastic grammar, and (iii) a bandit formulation of Bayesian optimization with automatic inference and action selection. All applications share a single 50-line Python library and require fewer than 20 lines of probabilistic code each.
Bayesian Optimization for Likelihood-Free Inference of Simulator-Based Statistical Models
Gutmann, Michael U., Corander, Jukka
Our paper deals with inferring simulator-based statistical models given some observed data. A simulator-based model is a parametrized mechanism which specifies how data are generated. It is thus also referred to as generative model. We assume that only a finite number of parameters are of interest and allow the generative process to be very general; it may be a noisy nonlinear dynamical system with an unrestricted number of hidden variables. This weak assumption is useful for devising realistic models but it renders statistical inference very difficult. The main challenge is the intractability of the likelihood function. Several likelihood-free inference methods have been proposed which share the basic idea of identifying the parameters by finding values for which the discrepancy between simulated and observed data is small. A major obstacle to using these methods is their computational cost. The cost is largely due to the need to repeatedly simulate data sets and the lack of knowledge about how the parameters affect the discrepancy. We propose a strategy which combines probabilistic modeling of the discrepancy with optimization to facilitate likelihood-free inference. The strategy is implemented using Bayesian optimization and is shown to accelerate the inference through a reduction in the number of required simulations by several orders of magnitude.
Convergence Rates of Active Learning for Maximum Likelihood Estimation
Chaudhuri, Kamalika, Kakade, Sham M., Netrapalli, Praneeth, Sanghavi, Sujay
An active learner is given a class of models, a large set of unlabeled examples, and the ability to interactively query labels of a subset of these examples; the goal of the learner is to learn a model in the class that fits the data well. Previous theoretical work has rigorously characterized label complexity of active learning, but most of this work has focused on the PAC or the agnostic PAC model. In this paper, we shift our attention to a more general setting -- maximum likelihood estimation. Provided certain conditions hold on the model class, we provide a two-stage active learning algorithm for this problem. The conditions we require are fairly general, and cover the widely popular class of Generalized Linear Models, which in turn, include models for binary and multi-class classification, regression, and conditional random fields. We provide an upper bound on the label requirement of our algorithm, and a lower bound that matches it up to lower order terms. Our analysis shows that unlike binary classification in the realizable case, just a single extraround of interaction is sufficient to achieve near-optimal performance in maximum likelihood estimation. On the empirical side, the recent work in (Gu et al. 2012) and (Gu et al. 2014) (on active linear and logistic regression) shows the promise of this approach.