Goto

Collaborating Authors

 Technology


Who's In the Picture

Neural Information Processing Systems

The context in which a name appears in a caption provides powerful cues as to who is depicted in the associated image. We obtain 44,773 face images, usinga face detector, from approximately half a million captioned news images and automatically link names, obtained using a named entity recognizer,with these faces. A simple clustering method can produce fairresults. We improve these results significantly by combining the clustering process with a model of the probability that an individual is depicted given its context. Once the labeling procedure is over, we have an accurately labeled set of faces, an appearance model for each individual depicted, and a natural language model that can produce accurate resultson captions in isolation.


Non-Local Manifold Tangent Learning

Neural Information Processing Systems

We claim and present arguments to the effect that a large class of manifold learningalgorithms that are essentially local and can be framed as kernel learning algorithms will suffer from the curse of dimensionality, at the dimension of the true underlying manifold. This observation suggests toexplore non-local manifold learning algorithms which attempt to discover shared structure in the tangent planes at different positions. A criterion for such an algorithm is proposed and experiments estimating a tangent plane prediction function are presented, showing its advantages with respect to local manifold learning algorithms: it is able to generalize veryfar from training data (on learning handwritten character image rotations), where a local nonparametric method fails.


Maximising Sensitivity in a Spiking Network

Neural Information Processing Systems

We use unsupervised probabilistic machine learning ideas to try to explain thekinds of learning observed in real neurons, the goal being to connect abstract principles of self-organisation to known biophysical processes.For example, we would like to explain Spike Timing-Dependent Plasticity (see [5,6] and Figure 3A), in terms of information theory. Starting out, we explore the optimisation of a network sensitivity measurerelated to maximising the mutual information between input spike timings and output spike timings. Our derivations are analogous to those in ICA, except that the sensitivity of output timings to input timings ismaximised, rather than the sensitivity of output'firing rates' to inputs. ICA and related approaches have been successful in explaining the learning of many properties of early visual receptive fields in rate coding models,and we are hoping for similar gains in understanding of spike coding in networks, and how this is supported, in principled probabilistic ways, by cellular biophysical processes. For now, in our initial simulations, weshow that our derived rule can learn synaptic weights which can unmix, or demultiplex, mixed spike trains. That is, it can recover independent pointprocesses embedded in distributed correlated input spike trains, using an adaptive single-layer feedforward spiking network.


Large-Scale Prediction of Disulphide Bond Connectivity

Neural Information Processing Systems

The formation of disulphide bridges among cysteines is an important feature ofprotein structures. Here we develop new methods for the prediction ofdisulphide bond connectivity. We first build a large curated data set of proteins containing disulphide bridges and then use 2-Dimensional Recursive Neural Networks to predict bonding probabilities between cysteine pairs.These probabilities in turn lead to a weighted graph matching problem that can be addressed efficiently. We show how the method consistently achievesbetter results than previous approaches on the same validation data. In addition, the method can easily cope with chains with arbitrary numbers of bonded cysteines. Therefore, it overcomes one of the major limitations of previous approaches restricting predictions to chains containing no more than 10 oxidized cysteines. The method can be applied both to situations where the bonded state of each cysteine is known or unknown, in which case bonded state can be predicted with 85% precision and 90% recall. The method also yields an estimate for the total number of disulphide bridges in each chain.


Co-Training and Expansion: Towards Bridging Theory and Practice

Neural Information Processing Systems

Co-training is a method for combining labeled and unlabeled data when examples can be thought of as containing two distinct sets of features. It has had a number of practical successes, yet previous theoretical analyses have needed very strong assumptions on the data that are unlikely to be satisfied in practice. In this paper, we propose a much weaker "expansion" assumption on the underlying data distribution, that we prove is sufficient for iterative cotraining tosucceed given appropriately strong PAClearning algorithms on each feature set, and that to some extent is necessary as well. This expansion assumption in fact motivates the iterative nature of the original co-trainingalgorithm, unlike stronger assumptions (such as independence giventhe label) that allow a simpler one-shot co-training to succeed. We also heuristically analyze the effect on performance of noise in the data. Predicted behavior is qualitatively matched in synthetic experiments onexpander graphs.


Computing regularization paths for learning multiple kernels

Neural Information Processing Systems

The problem of learning a sparse conic combination of kernel functions or kernel matrices for classification or regression can be achieved via the regularization by a block 1-norm [1]. In this paper, we present an algorithm thatcomputes the entire regularization path for these problems. The path is obtained by using numerical continuation techniques, and involves a running time complexity that is a constant times the complexity ofsolving the problem for one value of the regularization parameter. Working in the setting of kernel linear regression and kernel logistic regression, weshow empirically that the effect of the block 1-norm regularization differsnotably from the (non-block) 1-norm regularization commonly used for variable selection, and that the regularization path is of particular value in the block case.



The power of feature clustering: An application to object detection

Neural Information Processing Systems

We give a fast rejection scheme that is based on image segments and demonstrate it on the canonical example of face detection. However, instead offocusing on the detection step we focus on the rejection step and show that our method is simple and fast to be learned, thus making it an excellent pre-processing step to accelerate standard machine learning classifiers, such as neural-networks, Bayes classifiers or SVM. We decompose acollection of face images into regions of pixels with similar behavior over the image set. The relationships between the mean and variance of image segments are used to form a cascade of rejectors that can reject over 99.8% of image patches, thus only a small fraction of the image patches must be passed to a full-scale classifier. Moreover, the training time for our method is much less than an hour, on a standard PC.


Comparing Beliefs, Surveys, and Random Walks

Neural Information Processing Systems

It consists of a ensemble of randomly generated logical expressions, each depending onN Boolean variablesx i, and constructed by taking the AND of M clauses. Each clausea consists of the OR of 3 "literals"y i,a .


A Direct Formulation for Sparse PCA Using Semidefinite Programming

Neural Information Processing Systems

We examine the problem of approximating, in the Frobenius-norm sense, a positive, semidefinite symmetric matrix by a rank-one matrix, with an upper bound on the cardinality of its eigenvector. The problem arises in the decomposition of a covariance matrix into sparse factors, and has wide applications ranging from biology to finance. We use a modification ofthe classical variational representation of the largest eigenvalue of a symmetric matrix, where cardinality is constrained, and derive a semidefinite programming based relaxation for our problem.