Goto

Collaborating Authors

 Europe



Making Latin Manuscripts Searchable using gHMM's

Neural Information Processing Systems

We describe a method that can make a scanned, handwritten mediaeval latin manuscript accessible to full text search. A generalized HMM is fitted, using transcribed latin to obtain a transition model and one example eachof 22 letters to obtain an emission model. We show results for unigram, bigram and trigram models.


Bayesian inference in spiking neurons

Neural Information Processing Systems

We propose a new interpretation of spiking neurons as Bayesian integrators accumulatingevidence over time about events in the external world or the body, and communicating to other neurons their certainties about these events. In this model, spikes signal the occurrence of new information, i.e.what cannot be predicted from the past activity. As a result, firing statistics are close to Poisson, albeit providing a deterministic representation ofprobabilities. We proceed to develop a theory of Bayesian inference in spiking neural networks, recurrent interactions implementing avariant of belief propagation. Many perceptual and motor tasks performed by the central nervous system are probabilistic, andcan be described in a Bayesian framework [4, 3].


A Machine Learning Approach to Conjoint Analysis

Neural Information Processing Systems

Choice-based conjoint analysis builds models of consumer preferences over products with answers gathered in questionnaires. Our main goal is to bring tools from the machine learning community to solve this problem moreefficiently. Thus, we propose two algorithms to quickly and accurately estimate consumer preferences.


Incremental Algorithms for Hierarchical Classification

Neural Information Processing Systems

We study the problem of hierarchical classification when labels corresponding topartial and/or multiple paths in the underlying taxonomy are allowed. We introduce a new hierarchical loss function, the H-loss, implementing thesimple intuition that additional mistakes in the subtree of a mistaken class should not be charged for. Based on a probabilistic data model introduced in earlier work, we derive the Bayes-optimal classifier for the H-loss. We then empirically compare two incremental approximations ofthe Bayes-optimal classifier with a flat SVM classifier and with classifiers obtained by using hierarchical versions of the Perceptron and SVM algorithms. The experiments show that our simplest incremental approximationof the Bayes-optimal classifier performs, after just one training epoch, nearly as well as the hierarchical SVM classifier (which performs best). For the same incremental algorithm we also derive an H-loss bound showing, when data are generated by our probabilistic data model, exponentially fast convergence to the H-loss of the hierarchical classifier based on the true model parameters.


Dependent Gaussian Processes

Neural Information Processing Systems

Gaussian processes are usually parameterised in terms of their covariance functions.However, this makes it difficult to deal with multiple outputs, because ensuring that the covariance matrix is positive definite is problematic. An alternative formulation is to treat Gaussian processes as white noise sources convolved with smoothing kernels, and to parameterise thekernel instead. Using this, we extend Gaussian processes to handle multiple, coupled outputs.


Nonlinear Blind Source Separation by Integrating Independent Component Analysis and Slow Feature Analysis

Neural Information Processing Systems

In contrast to the equivalence of linear blind source separation and linear independent component analysis it is not possible to recover the original sourcesignal from some unknown nonlinear transformations of the sources using only the independence assumption. Integrating the objectives ofstatistical independence and temporal slowness removes this indeterminacy leading to a new method for nonlinear blind source separation. Theprinciple of temporal slowness is adopted from slow feature analysis, an unsupervised method to extract slowly varying features from a given observed vectorial signal. The performance of the algorithm is demonstrated on nonlinearly mixed speech data.


At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks

Neural Information Processing Systems

In this paper we analyze the relationship between the computational capabilities ofrandomly connected networks of threshold gates in the timeseries domain and their dynamical properties. In particular we propose a complexity measure which we find to assume its highest values near the edge of chaos, i.e. the transition from ordered to chaotic dynamics. Furthermore we show that the proposed complexity measure predicts the computational capabilities very well: only near the edge of chaos are such networks able to perform complex computations on time series. Additionally asimple synaptic scaling rule for self-organized criticality is presented and analyzed.


Who's In the Picture

Neural Information Processing Systems

The context in which a name appears in a caption provides powerful cues as to who is depicted in the associated image. We obtain 44,773 face images, usinga face detector, from approximately half a million captioned news images and automatically link names, obtained using a named entity recognizer,with these faces. A simple clustering method can produce fairresults. We improve these results significantly by combining the clustering process with a model of the probability that an individual is depicted given its context. Once the labeling procedure is over, we have an accurately labeled set of faces, an appearance model for each individual depicted, and a natural language model that can produce accurate resultson captions in isolation.


The power of feature clustering: An application to object detection

Neural Information Processing Systems

We give a fast rejection scheme that is based on image segments and demonstrate it on the canonical example of face detection. However, instead offocusing on the detection step we focus on the rejection step and show that our method is simple and fast to be learned, thus making it an excellent pre-processing step to accelerate standard machine learning classifiers, such as neural-networks, Bayes classifiers or SVM. We decompose acollection of face images into regions of pixels with similar behavior over the image set. The relationships between the mean and variance of image segments are used to form a cascade of rejectors that can reject over 99.8% of image patches, thus only a small fraction of the image patches must be passed to a full-scale classifier. Moreover, the training time for our method is much less than an hour, on a standard PC.