Goto

Collaborating Authors

 Bayesian Learning


Characterizing neural dependencies with copula models

Neural Information Processing Systems

The coding of information by neural populations depends critically on the statistical dependencies between neuronal responses. However, there is no simple model that combines the observations that (1) marginal distributions over single-neuron spike counts are often approximately Poisson; and (2) joint distributions over the responses of multiple neurons are often strongly dependent. Here, we show that both marginal and joint properties of neural responses can be captured using Poisson copula models. Copulas are joint distributions that allow random variables with arbitrary marginals to be combined while incorporating arbitrary dependencies between them. Different copulas capture different kinds of dependencies, allowing for a richer and more detailed description of dependencies than traditional summary statistics, such as correlation coefficients. We explore a variety of Poisson copula models for joint neural response distributions, and derive an efficient maximum likelihood procedure for estimating them. We apply these models to neuronal data collected in and macaque motor cortex, and quantify the improvement in coding accuracy afforded by incorporating the dependency structure between pairs of neurons.


A Transductive Bound for the Voted Classifier with an Application to Semi-supervised Learning

Neural Information Processing Systems

In this paper we present two transductive bounds on the risk of the majority vote estimated over partially labeled training sets. Our first bound is tight when the additional unlabeled training data are used in the cases where the voted classifier makes its errors on low margin observations and where the errors of the associated Gibbs classifier can accurately be estimated. In semi-supervised learning, considering the margin as an indicator of confidence constitutes the working hypothesis of algorithms which search the decision boundary on low density regions. In this case, we propose a second bound on the joint probability that the voted classifier makes an error over an example having its margin over a fixed threshold. As an application we are interested on self-learning algorithms which assign iteratively pseudo-labels to unlabeled training examples having margin above a threshold obtained from this bound. Empirical results on different datasets show the effectiveness of our approach compared to the same algorithm and the TSVM in which the threshold is fixed manually.


Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora

Neural Information Processing Systems

We propose Dirichlet-Bernoulli Alignment (DBA), a generative model for corpora in which each pattern (e.g., a document) contains a set of instances (e.g., paragraphs in the document) and belongs to multiple classes. By casting predefined classes as latent Dirichlet variables (i.e., instance level labels), and modeling the multi-label of each pattern as Bernoulli variables conditioned on the weighted empirical average of topic assignments, DBA automatically aligns the latent topics discovered from data to human-defined classes. DBA is useful for both pattern classification and instance disambiguation, which are tested on text classification and named entity disambiguation for web search queries respectively.


Sparse Estimation Using General Likelihoods and Non-Factorial Priors

Neural Information Processing Systems

Finding maximally sparse representations from overcomplete feature dictionaries frequently involves minimizing a cost function composed of a likelihood (or data fit) term and a prior (or penalty function) that favors sparsity. While typically the prior is factorial, here we examine non-factorial alternatives that have a number of desirable properties relevant to sparse estimation and are easily implemented using an efficient, globally-convergent reweighted $\ell_1$ minimization procedure. The first method under consideration arises from the sparse Bayesian learning (SBL) framework. Although based on a highly non-convex underlying cost function, in the context of canonical sparse estimation problems, we prove uniform superiority of this method over the Lasso in that, (i) it can never do worse, and (ii) for any dictionary and sparsity profile, there will always exist cases where it does better. These results challenge the prevailing reliance on strictly convex penalty functions for finding sparse solutions. We then derive a new non-factorial variant with similar properties that exhibits further performance improvements in empirical tests. For both of these methods, as well as traditional factorial analogs, we demonstrate the effectiveness of reweighted $\ell_1$-norm algorithms in handling more general sparse estimation problems involving classification, group feature selection, and non-negativity constraints. As a byproduct of this development, a rigorous reformulation of sparse Bayesian classification (e.g., the relevance vector machine) is derived that, unlike the original, involves no approximation steps and descends a well-defined objective function.


Learning in Markov Random Fields using Tempered Transitions

Neural Information Processing Systems

Markov random fields (MRFs), or undirected graphical models, provide a powerful framework for modeling complex dependencies among random variables. Maximum likelihood learning in MRFs is hard due to the presence of the global normalizing constant. In this paper we consider a class of stochastic approximation algorithms of Robbins-Monro type that uses Markov chain Monte Carlo to do approximate maximum likelihood learning. We show that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large densely-connected MRFs. Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data and perform well on digit and object recognition tasks.


Multi-Label Prediction via Sparse Infinite CCA

Neural Information Processing Systems

Canonical Correlation Analysis (CCA) is a useful technique for modeling dependencies between two (or more) sets of variables. Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of correlation components, and effectively capture the sparsity underlying the projections. In addition, given (partially) labeled data, our algorithm can also be used as a (semi)supervised dimensionality reduction technique, and can be applied to learn useful predictive features in the context of learning a set of related tasks. Experimental results demonstrate the efficacy of the proposed approach for both CCA as a stand-alone problem, and when applied to multi-label prediction.


Breaking Boundaries Between Induction Time and Diagnosis Time Active Information Acquisition

Neural Information Processing Systems

There has been a clear distinction between induction or training time and diagnosis time active information acquisition. While active learning during induction focuses on acquiring data that promises to provide the best classification model, the goal at diagnosis time focuses completely on next features to observe about the test case at hand in order to make better predictions about the case. We introduce a model and inferential methods that breaks this distinction. The methods can be used to extend case libraries under a budget but, more fundamentally, provide a framework for guiding agents to collect data under scarce resources, focused by diagnostic challenges. This extension to active learning leads to a new class of policies for real-time diagnosis, where recommended information-gathering sequences include actions that simultaneously seek new data for the case at hand and for cases in the training set.


Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise

Neural Information Processing Systems

Modern machine learning-based approaches to computer vision require very large databases of labeled images. Some contemporary vision systems already require on the order of millions of images for training (e.g., Omron face detector). While the collection of these large databases is becoming a bottleneck, new Internet-based services that allow labelers from around the world to be easily hired and managed provide a promising solution. However, using these services to label large databases brings with it new theoretical and practical challenges: (1) The labelers may have wide ranging levels of expertise which are unknown a priori, and in some cases may be adversarial; (2) images may vary in their level of difficulty; and (3) multiple labels for the same image must be combined to provide an estimate of the actual label of the image. Probabilistic approaches provide a principled way to approach these problems. In this paper we present a probabilistic model and use it to simultaneously infer the label of each image, the expertise of each labeler, and the difficulty of each image. On both simulated and real data, we demonstrate that the model outperforms the commonly used ``Majority Vote heuristic for inferring image labels, and is robust to both adversarial and noisy labelers.


Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

Neural Information Processing Systems

We present a nonparametric hierarchical Bayesian model of document collections that decouples sparsity and smoothness in the component distributions (i.e., the "topics"). In the sparse topic model (sparseTM), each topic is represented by a bank of selector variables that determine which terms appear in the topic. Thus each topic is associated with a subset of the vocabulary, and topic smoothness is modeled on this subset. We develop an efficient Gibbs sampler for the sparseTM that includes a general-purpose method for sampling from a Dirichlet mixture with a combinatorial number of components. We demonstrate the sparseTM on four real-world datasets. Compared to traditional approaches, the empirical results will show that sparseTMs give better predictive performance with simpler inferred models.


Perceptual Multistability as Markov Chain Monte Carlo Inference

Neural Information Processing Systems

While many perceptual and cognitive phenomena are well described in terms of Bayesian inference, the necessary computations are intractable at the scale of real-world tasks, and it remains unclear how the human mind approximates Bayesian inference algorithmically. We explore the proposal that for some tasks, humans use a form of Markov Chain Monte Carlo to approximate the posterior distribution over hidden variables. As a case study, we show how several phenomena of perceptual multistability can be explained as MCMC inference in simple graphical models for low-level vision.