AITopics

The coding of information by neural populations depends critically on the statistical dependencies between neuronal responses. However, there is no simple model that combines the observations that (1) marginal distributions over single-neuron spike counts are often approximately Poisson; and (2) joint distributions over the responses of multiple neurons are often strongly dependent. Here, we show that both marginal and joint properties of neural responses can be captured using Poisson copula models. Copulas are joint distributions that allow random variables with arbitrary marginals to be combined while incorporating arbitrary dependencies between them. Different copulas capture different kinds of dependencies, allowing for a richer and more detailed description of dependencies than traditional summary statistics, such as correlation coefficients. We explore a variety of Poisson copula models for joint neural response distributions, and derive an efficient maximum likelihood procedure for estimating them. We apply these models to neuronal data collected in and macaque motor cortex, and quantify the improvement in coding accuracy afforded by incorporating the dependency structure between pairs of neurons.

copula, dependency, neuron, (16 more...)

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Waltham (0.04)
Europe > United Kingdom (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)
Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.48)

Amini, Massih, Usunier, Nicolas, Laviolette, François

A Transductive Bound for the Voted Classifier with an Application to Semi-supervised Learning

In this paper we present two transductive bounds on the risk of the majority vote estimated over partially labeled training sets. Our first bound is tight when the additional unlabeled training data are used in the cases where the voted classifier makes its errors on low margin observations and where the errors of the associated Gibbs classifier can accurately be estimated. In semi-supervised learning, considering the margin as an indicator of confidence constitutes the working hypothesis of algorithms which search the decision boundary on low density regions. In this case, we propose a second bound on the joint probability that the voted classifier makes an error over an example having its margin over a fixed threshold. As an application we are interested on self-learning algorithms which assign iteratively pseudo-labels to unlabeled training examples having margin above a threshold obtained from this bound. Empirical results on different datasets show the effectiveness of our approach compared to the same algorithm and the TSVM in which the threshold is fixed manually.

algorithm, classifier, threshold, (16 more...)

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre:

Research Report (0.46)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.31)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.31)

Yang, Shuang-hong, Zha, Hongyuan, Hu, Bao-gang

Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora

We propose Dirichlet-Bernoulli Alignment (DBA), a generative model for corpora in which each pattern (e.g., a document) contains a set of instances (e.g., paragraphs in the document) and belongs to multiple classes. By casting predefined classes as latent Dirichlet variables (i.e., instance level labels), and modeling the multi-label of each pattern as Bernoulli variables conditioned on the weighted empirical average of topic assignments, DBA automatically aligns the latent topics discovered from data to human-defined classes. DBA is useful for both pattern classification and instance disambiguation, which are tested on text classification and named entity disambiguation for web search queries respectively.

classification, dba, dirichlet-bernoulli alignment, (16 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States (0.04)
Asia > China (0.04)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Wipf, David P., Nagarajan, Srikantan S.

Sparse Estimation Using General Likelihoods and Non-Factorial Priors

Finding maximally sparse representations from overcomplete feature dictionaries frequently involves minimizing a cost function composed of a likelihood (or data fit) term and a prior (or penalty function) that favors sparsity. While typically the prior is factorial, here we examine non-factorial alternatives that have a number of desirable properties relevant to sparse estimation and are easily implemented using an efficient, globally-convergent reweighted $\ell_1$ minimization procedure. The first method under consideration arises from the sparse Bayesian learning (SBL) framework. Although based on a highly non-convex underlying cost function, in the context of canonical sparse estimation problems, we prove uniform superiority of this method over the Lasso in that, (i) it can never do worse, and (ii) for any dictionary and sparsity profile, there will always exist cases where it does better. These results challenge the prevailing reliance on strictly convex penalty functions for finding sparse solutions. We then derive a new non-factorial variant with similar properties that exhibits further performance improvements in empirical tests. For both of these methods, as well as traditional factorial analogs, we demonstrate the effectiveness of reweighted $\ell_1$-norm algorithms in handling more general sparse estimation problems involving classification, group feature selection, and non-negativity constraints. As a byproduct of this development, a rigorous reformulation of sparse Bayesian classification (e.g., the relevance vector machine) is derived that, unlike the original, involves no approximation steps and descends a well-defined objective function.

cost function, iteration, sbl, (15 more...)

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > New York (0.04)
North America > United States > New Jersey (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Learning in Markov Random Fields using Tempered Transitions

Salakhutdinov, Ruslan R.

Markov random fields (MRFs), or undirected graphical models, provide a powerful framework for modeling complex dependencies among random variables. Maximum likelihood learning in MRFs is hard due to the presence of the global normalizing constant. In this paper we consider a class of stochastic approximation algorithms of Robbins-Monro type that uses Markov chain Monte Carlo to do approximate maximum likelihood learning. We show that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large densely-connected MRFs. Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data and perform well on digit and object recognition tasks.

algorithm, gibbs update, transition, (13 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)

Multi-Label Prediction via Sparse Infinite CCA

Rai, Piyush, Daume, Hal

Canonical Correlation Analysis (CCA) is a useful technique for modeling dependencies between two (or more) sets of variables. Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of correlation components, and effectively capture the sparsity underlying the projections. In addition, given (partially) labeled data, our algorithm can also be used as a (semi)supervised dimensionality reduction technique, and can be applied to learn useful predictive features in the context of learning a set of related tasks. Experimental results demonstrate the efficacy of the proposed approach for both CCA as a stand-alone problem, and when applied to multi-label prediction.

cca, dimensionality reduction, matrix, (13 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Utah (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)

Kapoor, Ashish, Horvitz, Eric

Breaking Boundaries Between Induction Time and Diagnosis Time Active Information Acquisition

There has been a clear distinction between induction or training time and diagnosis time active information acquisition. While active learning during induction focuses on acquiring data that promises to provide the best classification model, the goal at diagnosis time focuses completely on next features to observe about the test case at hand in order to make better predictions about the case. We introduce a model and inferential methods that breaks this distinction. The methods can be used to extend case libraries under a budget but, more fundamentally, provide a framework for guiding agents to collect data under scarce resources, focused by diagnostic challenges. This extension to active learning leads to a new class of policies for real-time diagnosis, where recommended information-gathering sequences include actions that simultaneously seek new data for the case at hand and for cases in the training set.

acquisition, information, information acquisition, (15 more...)

Country: North America > United States > Washington > King County > Redmond (0.04)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)

Whitehill, Jacob, Wu, Ting-fan, Bergsma, Jacob, Movellan, Javier R., Ruvolo, Paul L.

Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise

Modern machine learning-based approaches to computer vision require very large databases of labeled images. Some contemporary vision systems already require on the order of millions of images for training (e.g., Omron face detector). While the collection of these large databases is becoming a bottleneck, new Internet-based services that allow labelers from around the world to be easily hired and managed provide a promising solution. However, using these services to label large databases brings with it new theoretical and practical challenges: (1) The labelers may have wide ranging levels of expertise which are unknown a priori, and in some cases may be adversarial; (2) images may vary in their level of difficulty; and (3) multiple labels for the same image must be combined to provide an estimate of the actual label of the image. Probabilistic approaches provide a principled way to approach these problems. In this paper we present a probabilistic model and use it to simultaneously infer the label of each image, the expertise of each labeler, and the difficulty of each image. On both simulated and real data, we demonstrate that the model outperforms the commonly used ``Majority Vote heuristic for inferring image labels, and is robust to both adversarial and noisy labelers.

artificial intelligence, labeler, machine learning, (16 more...)

Country: North America > United States > California (0.28)

Genre: Research Report (0.66)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)

Wang, Chong, Blei, David M.

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

We present a nonparametric hierarchical Bayesian model of document collections that decouples sparsity and smoothness in the component distributions (i.e., the "topics"). In the sparse topic model (sparseTM), each topic is represented by a bank of selector variables that determine which terms appear in the topic. Thus each topic is associated with a subset of the vocabulary, and topic smoothness is modeled on this subset. We develop an efficient Gibbs sampler for the sparseTM that includes a general-purpose method for sampling from a Dirichlet mixture with a combinatorial number of components. We demonstrate the sparseTM on four real-world datasets. Compared to traditional approaches, the empirical results will show that sparseTMs give better predictive performance with simpler inferred models.

machine learning, natural language, sparsetm, (15 more...)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Gershman, Samuel, Vul, Ed, Tenenbaum, Joshua B.

Perceptual Multistability as Markov Chain Monte Carlo Inference

While many perceptual and cognitive phenomena are well described in terms of Bayesian inference, the necessary computations are intractable at the scale of real-world tasks, and it remains unclear how the human mind approximates Bayesian inference algorithmically. We explore the proposal that for some tasks, humans use a form of Markov Chain Monte Carlo to approximate the posterior distribution over hidden variables. As a case study, we show how several phenomena of perceptual multistability can be explained as MCMC inference in simple graphical models for low-level vision.

artificial intelligence, bayesian inference, machine learning, (17 more...)

Country: North America > United States > Massachusetts (0.28)

Industry: Health & Medicine > Therapeutic Area (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)