Goto

Collaborating Authors

 Uncertainty


Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts

arXiv.org Machine Learning

We present a Bayesian nonparametric framework for multilevel clustering which utilizes group-level context information to simultaneously discover low-dimensional structures of the group contents and partitions groups into clusters. Using the Dirichlet process as the building block, our model constructs a product base-measure with a nested structure to accommodate content and context observations at multiple levels. The proposed model possesses properties that link the nested Dirichlet processes (nDP) and the Dirichlet process mixture models (DPM) in an interesting way: integrating out all contents results in the DPM over contexts, whereas integrating out group-specific contexts results in the nDP mixture over content variables. We provide a Polya-urn view of the model and an efficient collapsed Gibbs inference procedure. Extensive experiments on real-world datasets demonstrate the advantage of utilizing context information via our model in both text and image domains.


Community Detection in Networks using Graph Distance

arXiv.org Machine Learning

The study of networks has received increased attention recently not only from the social sciences and statistics but also from physicists, computer scientists and mathematicians. One of the principal problem in networks is community detection. Many algorithms have been proposed for community finding but most of them do not have have theoretical guarantee for sparse networks and networks close to the phase transition boundary proposed by physicists. There are some exceptions but all have some incomplete theoretical basis. Here we propose an algorithm based on the graph distance of vertices in the network. We give theoretical guarantees that our method works in identifying communities for block models and can be extended for degree-corrected block models and block models with the number of communities growing with number of vertices. Despite favorable simulation results, we are not yet able to conclude that our method is satisfactory for worst possible case. We illustrate on a network of political blogs, Facebook networks and some other networks.


Multimodal Transitions for Generative Stochastic Networks

arXiv.org Machine Learning

Generative Stochastic Networks (GSNs) have been recently introduced as an alternative to traditional probabilistic modeling: instead of parametrizing the data distribution directly, one parametrizes a transition operator for a Markov chain whose stationary distribution is an estimator of the data generating distribution. The result of training is therefore a machine that generates samples through this Markov chain. However, the previously introduced GSN consistency theorems suggest that in order to capture a wide class of distributions, the transition operator in general should be multimodal, something that has not been done before this paper. We introduce for the first time multimodal transition distributions for GSNs, in particular using models in the NADE family (Neural Autoregressive Density Estimator) as output distributions of the transition operator. A NADE model is related to an RBM (and can thus model multimodal distributions) but its likelihood (and likelihood gradient) can be computed easily. The parameters of the NADE are obtained as a learned function of the previous state of the learned Markov chain. Experiments clearly illustrate the advantage of such multimodal transition distributions over unimodal GSNs.


Asymptotic Accuracy of Bayes Estimation for Latent Variables with Redundancy

arXiv.org Machine Learning

Hierarchical parametric models consisting of observable and latent variables are widely used for unsupervised learning tasks. For example, a mixture model is a representative hierarchical model for clustering. From the statistical point of view, the models can be regular or singular due to the distribution of data. In the regular case, the models have the identifiability; there is one-to-one relation between a probability density function for the model expression and the parameter. The Fisher information matrix is positive definite, and the estimation accuracy of both observable and latent variables has been studied. In the singular case, on the other hand, the models are not identifiable and the Fisher matrix is not positive definite. Conventional statistical analysis based on the inverse Fisher matrix is not applicable. Recently, an algebraic geometrical analysis has been developed and is used to elucidate the Bayes estimation of observable variables. The present paper applies this analysis to latent-variable estimation and determines its theoretical performance. Our results clarify behavior of the convergence of the posterior distribution. It is found that the posterior of the observable-variable estimation can be different from the one in the latent-variable estimation. Because of the difference, the Markov chain Monte Carlo method based on the parameter and the latent variable cannot construct the desired posterior distribution.


Gaussian-binary Restricted Boltzmann Machines on Modeling Natural Image Statistics

arXiv.org Machine Learning

We present a theoretical analysis of Gaussian-binary restricted Boltzmann machines (GRBMs) from the perspective of density models. The key aspect of this analysis is to show that GRBMs can be formulated as a constrained mixture of Gaussians, which gives a much better insight into the model's capabilities and limitations. We show that GRBMs are capable of learning meaningful features both in a two-dimensional blind source separation task and in modeling natural images. Further, we show that reported difficulties in training GRBMs are due to the failure of the training algorithm rather than the model itself. Based on our analysis we are able to propose several training recipes, which allowed successful and fast training in our experiments. Finally, we discuss the relationship of GRBMs to several modifications that have been proposed to improve the model.


Causal Discovery in a Binary Exclusive-or Skew Acyclic Model: BExSAM

arXiv.org Machine Learning

Discovering causal relations among observed variables in a given data set is a major objective in studies of statistics and artificial intelligence. Recently, some techniques to discover a unique causal model have been explored based on non-Gaussianity of the observed data distribution. However, most of these are limited to continuous data. In this paper, we present a novel causal model for binary data and propose an efficient new approach to deriving the unique causal model governing a given binary data set under skew distributions of external binary noises. Experimental evaluation shows excellent performance for both artificial and real world data sets.


Hilbert Space Methods for Reduced-Rank Gaussian Process Regression

arXiv.org Machine Learning

This paper proposes a novel scheme for reduced-rank Gaussian process regression. The method is based on an approximate series expansion of the covariance function in terms of an eigenfunction expansion of the Laplace operator in a compact subset of $\mathbb{R}^d$. On this approximate eigenbasis the eigenvalues of the covariance function can be expressed as simple functions of the spectral density of the Gaussian process, which allows the GP inference to be solved under a computational cost scaling as $\mathcal{O}(nm^2)$ (initial) and $\mathcal{O}(m^3)$ (hyperparameter learning) with $m$ basis functions and $n$ data points. The approach also allows for rigorous error analysis with Hilbert space theory, and we show that the approximation becomes exact when the size of the compact subset and the number of eigenfunctions go to infinity. The expansion generalizes to Hilbert spaces with an inner product which is defined as an integral over a specified input density. The method is compared to previously proposed methods theoretically and through empirical tests with simulated and real data.


On change point detection using the fused lasso method

arXiv.org Machine Learning

In this paper we analyze the asymptotic properties of l1 penalized maximum likelihood estimation of signals with piece-wise constant mean values and/or variances. The focus is on segmentation of a non-stationary time series with respect to changes in these model parameters. This change point detection and estimation problem is also referred to as total variation denoising or l1 -mean filtering and has many important applications in most fields of science and engineering. We establish the (approximate) sparse consistency properties, including rate of convergence, of the so-called fused lasso signal approximator (FLSA). We show that this only holds if the sign of the corresponding consecutive changes are all different, and that this estimator is otherwise incapable of correctly detecting the underlying sparsity pattern. The key idea is to notice that the optimality conditions for this problem can be analyzed using techniques related to brownian bridge theory.


Guaranteed Model Order Estimation and Sample Complexity Bounds for LDA

arXiv.org Machine Learning

The question of how to determine the number of independent latent factors, or topics, in Latent Dirichlet Allocation (LDA) is of great practical importance. In most applications, the exact number of topics is unknown, and depends on the application and the size of the data set. We introduce a spectral model selection procedure for topic number estimation that does not require learning the model's latent parameters beforehand and comes with probabilistic guarantees. The procedure is motivated by the spectral learning approach and relies on adaptations of results from random matrix theory. In a simulation experiment taken from the nonparametric Bayesian literature, this procedure outperforms the nonparametric Bayesian approach in both accuracy and speed. We also discuss some implications of our results for the sample complexity and accuracy of popular spectral learning algorithms for LDA. The principles underlying the procedure can be extended to spectral learning algorithms for other exchangeable mixture models with similar conditional independence properties.


Properties of Bethe Free Energies and Message Passing in Gaussian Models

arXiv.org Machine Learning

We address the problem of computing approximate marginals in Gaussian probabilistic models by using mean field and fractional Bethe approximations. We define the Gaussian fractional Bethe free energy in terms of the moment parameters of the approximate marginals, derive a lower and an upper bound on the fractional Bethe free energy and establish a necessary condition for the lower bound to be bounded from below. It turns out that the condition is identical to the pairwise normalizability condition, which is known to be a sufficient condition for the convergence of the message passing algorithm. We show that stable fixed points of the Gaussian message passing algorithm are local minima of the Gaussian Bethe free energy. By a counterexample, we disprove the conjecture stating that the unboundedness of the free energy implies the divergence of the message passing algorithm.