Goto

Collaborating Authors

 Bayesian Inference


Convergence rate of Bayesian tensor estimator: Optimal rate without restricted strong convexity

arXiv.org Machine Learning

In this paper, we investigate the statistical convergence rate of a Bayesian low-rank tensor estimator. Our problem setting is the regression problem where a tensor structure underlying the data is estimated. This problem setting occurs in many practical applications, such as collaborative filtering, multi-task learning, and spatio-temporal data analysis. The convergence rate is analyzed in terms of both in-sample and out-of-sample predictive accuracies. It is shown that a near optimal rate is achieved without any strong convexity of the observation. Moreover, we show that the method has adaptivity to the unknown rank of the true tensor, that is, the near optimal rate depending on the true rank is achieved even if it is not known a priori.


Marginal Likelihoods for Distributed Parameter Estimation of Gaussian Graphical Models

arXiv.org Machine Learning

We consider distributed estimation of the inverse covariance matrix, also called the concentration or precision matrix, in Gaussian graphical models. Traditional centralized estimation often requires global inference of the covariance matrix, which can be computationally intensive in large dimensions. Approximate inference based on message-passing algorithms, on the other hand, can lead to unstable and biased estimation in loopy graphical models. In this paper, we propose a general framework for distributed estimation based on a maximum marginal likelihood (MML) approach. This approach computes local parameter estimates by maximizing marginal likelihoods defined with respect to data collected from local neighborhoods. Due to the non-convexity of the MML problem, we introduce and solve a convex relaxation. The local estimates are then combined into a global estimate without the need for iterative message-passing between neighborhoods. The proposed algorithm is naturally parallelizable and computationally efficient, thereby making it suitable for high-dimensional problems. In the classical regime where the number of variables $p$ is fixed and the number of samples $T$ increases to infinity, the proposed estimator is shown to be asymptotically consistent and to improve monotonically as the local neighborhood size increases. In the high-dimensional scaling regime where both $p$ and $T$ increase to infinity, the convergence rate to the true parameters is derived and is seen to be comparable to centralized maximum likelihood estimation. Extensive numerical experiments demonstrate the improved performance of the two-hop version of the proposed estimator, which suffices to almost close the gap to the centralized maximum likelihood estimator at a reduced computational cost.


Comparing Nonparametric Bayesian Tree Priors for Clonal Reconstruction of Tumors

arXiv.org Machine Learning

Statistical machine learning methods, especially nonparametric Bayesian methods, have become increasingly popular to infer clonal population structure of tumors. Here we describe the treeCRP, an extension of the Chinese restaurant process (CRP), a popular construction used in nonparametric mixture models, to infer the phylogeny and genotype of major subclonal lineages represented in the population of cancer cells. We also propose new split-merge updates tailored to the subclonal reconstruction problem that improve the mixing time of Markov chains. In comparisons with the tree-structured stick breaking prior used in PhyloSub, we demonstrate superior mixing and running time using the treeCRP with our new split-merge procedures. We also show that given the same number of samples, TSSB and treeCRP have similar ability to recover the subclonal structure of a tumor.


Gaussian Process Structural Equation Models with Latent Variables

arXiv.org Machine Learning

In a variety of disciplines such as social sciences, psychology, medicine and economics, the recorded data are considered to be noisy measurements of latent variables connected by some causal structure. This corresponds to a family of graphical models known as the structural equation model with latent variables. While linear non-Gaussian variants have been well-studied, inference in nonparametric structural equation models is still underdeveloped. We introduce a sparse Gaussian process parameterization that defines a non-linear structure connecting latent variables, unlike common formulations of Gaussian process latent variable models. The sparse parameterization is given a full Bayesian treatment without compromising Markov chain Monte Carlo efficiency. We compare the stability of the sampling procedure and the predictive ability of the model against the current practice.


Conditional Probability Tree Estimation Analysis and Algorithms

arXiv.org Machine Learning

We consider the problem of estimating the conditional probability of a label in time O(log n), where n is the number of possible labels. We analyze a natural reduction of this problem to a set of binary regression problems organized in a tree structure, proving a regret bound that scales with the depth of the tree. Motivated by this analysis, we propose the first online algorithm which provably constructs a logarithmic depth tree on the set of labels to solve this problem. We test the algorithm empirically, showing that it works succesfully on a dataset with roughly 106 labels.


Quantum Annealing for Variational Bayes Inference

arXiv.org Machine Learning

This paper presents studies on a deterministic annealing algorithm based on quantum annealing for variational Bayes (QAVB) inference, which can be seen as an extension of the simulated annealing for variational Bayes (SAVB) inference. QAVB is as easy as SAVB to implement. Experiments revealed QAVB finds a better local optimum than SAVB in terms of the variational free energy in latent Dirichlet allocation (LDA).


Bayesian Multitask Learning with Latent Hierarchies

arXiv.org Machine Learning

We learn multiple hypotheses for related tasks under a latent hierarchical relationship between tasks. We exploit the intuition that for domain adaptation, we wish to share classifier structure, but for multitask learning, we wish to share covariance structure. Our hierarchical model is seen to subsume several previously proposed multitask learning models and performs well on three distinct real-world data sets.


Probabilistic inverse reinforcement learning in unknown environments

arXiv.org Machine Learning

We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents. We do this by deriving two simplified probabilistic models of the demonstrator's policy and utility. For tractability, we use maximum a posteriori estimation rather than full Bayesian inference. Under a flat prior, this results in a convex optimisation problem. We find that the resulting algorithms are highly competitive against a variety of other methods for inverse reinforcement learning that do have knowledge of the dynamics.


Non-Convex Rank Minimization via an Empirical Bayesian Approach

arXiv.org Machine Learning

In many applications that require matrix solutions of minimal rank, the underlying cost function is non-convex leading to an intractable, NP-hard optimization problem. Consequently, the convex nuclear norm is frequently used as a surrogate penalty term for matrix rank. The problem is that in many practical scenarios there is no longer any guarantee that we can correctly estimate generative low-rank matrices of interest, theoretical special cases notwithstanding. Consequently, this paper proposes an alternative empirical Bayesian procedure build upon a variational approximation that, unlike the nuclear norm, retains the same globally minimizing point estimate as the rank function under many useful constraints. However, locally minimizing solutions are largely smoothed away via marginalization, allowing the algorithm to succeed when standard convex relaxations completely fail. While the proposed methodology is generally applicable to a wide range of low-rank applications, we focus our attention on the robust principal component analysis problem (RPCA), which involves estimating an unknown low-rank matrix with unknown sparse corruptions. Theoretical and empirical evidence are presented to show that our method is potentially superior to related MAP-based approaches, for which the convex principle component pursuit (PCP) algorithm (Candes et al., 2011) can be viewed as a special case.


Bayesian Structure Learning for Markov Random Fields with a Spike and Slab Prior

arXiv.org Machine Learning

In recent years a number of methods have been developed for automatically learning the (sparse) connectivity structure of Markov Random Fields. These methods are mostly based on L1-regularized optimization which has a number of disadvantages such as the inability to assess model uncertainty and expensive crossvalidation to find the optimal regularization parameter. Moreover, the model's predictive performance may degrade dramatically with a suboptimal value of the regularization parameter (which is sometimes desirable to induce sparseness). We propose a fully Bayesian approach based on a "spike and slab" prior (similar to L0 regularization) that does not suffer from these shortcomings. We develop an approximate MCMC method combining Langevin dynamics and reversible jump MCMC to conduct inference in this model. Experiments show that the proposed model learns a good combination of the structure and parameter values without the need for separate hyper-parameter tuning. Moreover, the model's predictive performance is much more robust than L1-based methods with hyper-parameter settings that induce highly sparse model structures.