Goto

Collaborating Authors

 Statistical Learning


Bayesian orthogonal component analysis for sparse representation

arXiv.org Machine Learning

This paper addresses the problem of identifying a lower dimensional space where observed data can be sparsely represented. This under-complete dictionary learning task can be formulated as a blind separation problem of sparse sources linearly mixed with an unknown orthogonal mixing matrix. This issue is formulated in a Bayesian framework. First, the unknown sparse sources are modeled as Bernoulli-Gaussian processes. To promote sparsity, a weighted mixture of an atom at zero and a Gaussian distribution is proposed as prior distribution for the unobserved sources. A non-informative prior distribution defined on an appropriate Stiefel manifold is elected for the mixing matrix. The Bayesian inference on the unknown parameters is conducted using a Markov chain Monte Carlo (MCMC) method. A partially collapsed Gibbs sampler is designed to generate samples asymptotically distributed according to the joint posterior distribution of the unknown model parameters and hyperparameters. These samples are then used to approximate the joint maximum a posteriori estimator of the sources and mixing matrix. Simulations conducted on synthetic data are reported to illustrate the performance of the method for recovering sparse representations. An application to sparse coding on under-complete dictionary is finally investigated.


Regularization for Matrix Completion

arXiv.org Machine Learning

We consider the problem of reconstructing a low rank matrix from noisy observations of a subset of its entries. This task has applications in statistical learning, computer vision, and signal processing. In these contexts, "noise" generically refers to any contribution to the data that is not captured by the low-rank model. In most applications, the noise level is large compared to the underlying signal and it is important to avoid overfitting. In order to tackle this problem, we define a regularized cost function well suited for spectral reconstruction methods. Within a random noise model, and in the large system limit, we prove that the resulting accuracy undergoes a phase transition depending on the noise level and on the fraction of observed entries. The cost function can be minimized using OPTSPACE (a manifold gradient descent algorithm). Numerical simulations show that this approach is competitive with state-of-the-art alternatives.


Sharing Features among Dynamical Systems with Beta Processes

Neural Information Processing Systems

We propose a Bayesian nonparametric approach to relating multiple time series via a set of latent, dynamical behaviors. Using a beta process prior, we allow data-driven selection of the size of this set, as well as the pattern with which behaviors are shared among time series. Via the Indian buffet process representation of the beta process predictive distributions, we develop an exact Markov chain Monte Carlo inference method. In particular, our approach uses the sum-product algorithm to efficiently compute Metropolis-Hastings acceptance probabilities, and explores new dynamical behaviors via birth/death proposals. We validate our sampling algorithm using several synthetic datasets, and also demonstrate promising unsupervised segmentation of visual motion capture data.


Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations

Neural Information Processing Systems

Non-parametric Bayesian techniques are considered for learning dictionaries for sparse image representations, with applications in denoising, inpainting and compressive sensing (CS). The beta process is employed as a prior for learning the dictionary, and this non-parametric method naturally infers an appropriate dictionary size. The Dirichlet process and a probit stick-breaking process are also considered to exploit structure within an image. The proposed method can learn a sparse dictionary in situ; training images may be exploited if available, but they are not required. Further, the noise variance need not be known, and can be non-stationary. Another virtue of the proposed method is that sequential inference can be readily employed, thereby allowing scaling to large images. Several example results are presented, using both Gibbs and variational Bayesian inference, with comparisons to other state-of-the-art approaches.


Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation

Neural Information Processing Systems

We show that an important and computationally challenging solution space feature of the graph coloring problem (COL), namely the number of clusters of solutions, can be accurately estimated by a technique very similar to one for counting the number of solutions. This cluster counting approach can be naturally written in terms of a new factor graph derived from the factor graph representing the COL instance. Using a variant of the Belief Propagation inference framework, we can efficiently approximate cluster counts in random COL problems over a large range of graph densities. We illustrate the algorithm on instances with up to 100, 000 vertices. Moreover, we supply a methodology for computing the number of clusters exactlyusing advanced techniques from the knowledge compilation literature.


Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation

Neural Information Processing Systems

We show that an important and computationally challenging solution space feature of the graph coloring problem (COL), namely the number of clusters of solutions, can be accurately estimated by a technique very similar to one for counting the number of solutions. This cluster counting approach can be naturally written in terms of a new factor graph derived from the factor graph representing the COL instance. Using a variant of the Belief Propagation inference framework, we can efficiently approximate cluster counts in random COL problems over a large range of graph densities. We illustrate the algorithm on instances with up to 100, 000 vertices. Moreover, we supply a methodology for computing the number of clusters exactly using advanced techniques from the knowledge compilation literature.


Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation

Neural Information Processing Systems

We show that an important and computationally challenging solution space feature of the graph coloring problem (COL), namely the number of clusters of solutions, can be accurately estimated by a technique very similar to one for counting the number of solutions. This cluster counting approach can be naturally written in terms of a new factor graph derived from the factor graph representing the COL instance. Using a variant of the Belief Propagation inference framework, we can efficiently approximate cluster counts in random COL problems over a large range of graph densities. We illustrate the algorithm on instances with up to 100, 000 vertices. Moreover, we supply a methodology for computing the number of clusters exactly using advanced techniques from the knowledge compilation literature.


Theory of matching pursuit

Neural Information Processing Systems

We analyse matching pursuit for kernel principal components analysis (KPCA) by proving that the sparse subspace it produces is a sample compression scheme. We show that this bound is tighter than the KPCA bound of Shawe-Taylor et al [7] and highly predictive of the size of the subspace needed to capture most of the variance in the data. We analyse a second matching pursuit algorithm called kernel matching pursuit (KMP) which does not correspond to a sample compression scheme. However, we give a novel bound that views the choice of subspace of the KMP algorithm as a compression scheme and hence provide a VC bound to upper bound its future loss. Finally we describe how the same bound can be applied to other matching pursuit related algorithms.



Multi-Label Prediction via Compressed Sensing

Neural Information Processing Systems

We consider multi-label prediction problems with large output spaces under the assumption of output sparsity - that the target (label) vectors have small support. We develop a general theory for a variant of the popular error correcting output code scheme, using ideas from compressed sensing for exploiting this sparsity. The method can be regarded as a simple reduction from multi-label regression problems to binary regression problems. We show that the number of subproblems need only be logarithmic in the total number of possible labels, making this approach radically more efficient than others. We also state and prove robustness guarantees for this method in the form of regret transform bounds (in general), and also provide a more detailed analysis for the linear prediction setting.