Goto

Collaborating Authors

 Statistical Learning


How Linear are Auditory Cortical Responses?

Neural Information Processing Systems

By comparison to some other sensory cortices, the functional properties of cells in the primary auditory cortex are not yet well understood. Recent attempts to obtain a generalized description of auditory cortical responses have often relied upon characterization of the spectrotemporal receptive field (STRF), which amounts to a model of the stimulusresponse function (SRF) that is linear in the spectrogram of the stimulus.


Bayesian Models of Inductive Generalization

Neural Information Processing Systems

We argue that human inductive generalization is best explained in a Bayesian framework, rather than by traditional models based on similarity computations. We go beyond previous work on Bayesian concept learning by introducing an unsupervised method for constructing flexible hypothesis spaces, and we propose a version of the Bayesian Occam's razor that trades off priors and likelihoods to prevent under-or over-generalization in these flexible spaces. We analyze two published data sets on inductive reasoning as well as the results of a new behavioral study that we have carried out.


A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse, High-Dimensional Domains

Neural Information Processing Systems

We develop a maximum entropy (maxent) approach to generating recommendations inthe context of a user's current navigation stream, suitable for environments where data is sparse, high-dimensional, and dynamic-- conditions typical of many recommendation applications. We address sparsity and dimensionality reduction by first clustering items based on user access patterns so as to attempt to minimize the apriori probability thatrecommendations will cross cluster boundaries and then recommending onlywithin clusters. We address the inherent dynamic nature of the problem by explicitly modeling the data as a time series; we show how this representational expressivity fits naturally into a maxent framework.


Identity Uncertainty and Citation Matching

Neural Information Processing Systems

Identity uncertainty is a pervasive problem in real-world data analysis. It arises whenever objects are not labeled with unique identifiers or when those identifiers may not be perceived perfectly. In such cases, two observations mayor may not correspond to the same object. In this paper, we consider the problem in the context of citation matching--the problem ofdeciding which citations correspond to the same publication. Our approach is based on the use of a relational probability model to define a generative model for the domain, including models of author and title corruption and a probabilistic citation grammar. Identity uncertainty is handled by extending standard models to incorporate probabilities over the possible mappings between terms in the language and objects in the domain. Inference is based on Markov chain Monte Carlo, augmented with specific methods for generating efficient proposals when the domain contains many objects. Results on several citation data sets show that the method outperforms current algorithms for citation matching. The declarative, relational nature of the model also means that our algorithm can determine object characteristics such as author names by combining multiple citations of multiple papers.


Transductive and Inductive Methods for Approximate Gaussian Process Regression

Neural Information Processing Systems

Gaussian process regression allows a simple analytical treatment of exact Bayesianinference and has been found to provide good performance, yet scales badly with the number of training data. In this paper we compare severalapproaches towards scaling Gaussian processes regression to large data sets: the subset of representers method, the reduced rank approximation, online Gaussian processes, and the Bayesian committee machine.Furthermore we provide theoretical insight into some of our experimental results. We found that subset of representers methods can give good and particularly fast predictions for data sets with high and medium noise levels. On complex low noise data sets, the Bayesian committee machine achieves significantly better accuracy, yet at a higher computational cost.


Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems

Neural Information Processing Systems

We introduce a family of classifiers based on a physical analogy to an electrostatic system of charged conductors. The family, called Coulomb classifiers, includes the two best-known support-vector machines (SVMs), the ν-SVM and the C-SVM. In the electrostatics analogy,a training example corresponds to a charged conductor at a given location in space, the classification function corresponds to the electrostatic potential function, and the training objective function corresponds to the Coulomb energy. The electrostatic framework provides not only a novel interpretation of existing algorithms andtheir interrelationships, but it suggests a variety of new methods for SVMs including kernels that bridge the gap between polynomial and radial-basis functions, objective functions that do not require positive-definite kernels, regularization techniques that allow for the construction of an optimal classifier in Minkowski space. Based on the framework, we propose novel SVMs and perform simulationstudies to show that they are comparable or superior tostandard SVMs. The experiments include classification tasks on data which are represented in terms of their pairwise proximities, wherea Coulomb Classifier outperformed standard SVMs.


Margin Analysis of the LVQ Algorithm

Neural Information Processing Systems

Prototypes based algorithms are commonly used to reduce the computational complexityof Nearest-Neighbour (NN) classifiers. In this paper we discuss theoretical and algorithmical aspects of such algorithms. On the theory side, we present margin based generalization bounds that suggest thatthese kinds of classifiers can be more accurate then the 1-NN rule. Furthermore, we derived a training algorithm that selects a good set of prototypes using large margin principles. We also show that the 20 years old Learning Vector Quantization (LVQ) algorithm emerges naturally fromour framework.


Kernel Dependency Estimation

Neural Information Processing Systems

Jason Weston, Olivier Chapelle, Andre Elisseeff, Bernhard Scholkopf and Vladimir Vapnik* Max Planck Institute for Biological Cybernetics, 72076 Tubingen, Germany *NEC Research Institute, Princeton, NJ 08540 USA Abstract We consider the learning problem of finding a dependency between a general class of objects and another, possibly different, general class of objects. The objects can be for example: vectors, images, strings, trees or graphs. Such a task is made possible by employing similarity measures in both input and output spaces using kernel functions,thus embedding the objects into vector spaces. We experimentally validate our approach on several tasks: mapping strings to strings, pattern recognition, and reconstruction from partial images. 1 Introduction In this article we consider the rather general learning problem of finding a dependency betweeninputs x E X and outputs y E Y given a training set (Xl,yl), ...,(xm, Ym) This includes conventional pattern recognition and regression estimation. It also encompasses more complex dependency estimation tasks, e.g mapping of a certain class of strings to a certain class of graphs (as in text parsing) or the mapping of text descriptions to images.


Boosted Dyadic Kernel Discriminants

Neural Information Processing Systems

We introduce a novel learning algorithm for binary classification with hyperplane discriminants based on pairs of training points from opposite classes (dyadic hypercuts). This algorithm is further extended to nonlinear discriminants using kernel functions satisfying Mercer'sconditions. An ensemble of simple dyadic hypercuts is learned incrementally by means of a confidence-rated version of AdaBoost, whichprovides a sound strategy for searching through the finite set of hypercut hypotheses. In experiments with real-world datasets from the UCI repository, the generalization performance of the hypercut classifiers was found to be comparable to that of SVMs and k-NN classifiers. Furthermore, the computational cost of classification (at run time) was found to be similar to, or better than,that of SVM. Similarly to SVMs, boosted dyadic kernel discriminants tend to maximize the margin (via AdaBoost). In contrast to SVMs, however, we offer an online and incremental learning machine for building kernel discriminants whose complexity (numberof kernel evaluations) can be directly controlled (traded off for accuracy).


Information Diffusion Kernels

Neural Information Processing Systems

A new family of kernels for statistical learning is introduced that exploits thegeometric structure of statistical models. Based on the heat equation on the Riemannian manifold defined by the Fisher information metric,information diffusion kernels generalize the Gaussian kernel of Euclidean space, and provide a natural way of combining generative statistical modeling with nonparametric discriminative learning. As a special case, the kernels give a new approach to applying kernel-based learning algorithms to discrete data. Bounds on covering numbers for the new kernels are proved using spectral theory in differential geometry, and experimental results are presented for text classification.