Goto

Collaborating Authors

 Technology


Predicting human gaze using low-level saliency combined with face detection

Neural Information Processing Systems

Under natural viewing conditions, human observers shift their gaze to allocate processing resources to subsets of the visual input. Many computational models have aimed at predicting such voluntary attentional shifts. Although the importance of high level stimulus properties (higher order statistics, semantics) stands undisputed, most models are based on low-level features of the input alone. In this study we recorded eye-movements of human observers while they viewed photographs of natural scenes. About two thirds of the stimuli contained at least one person. We demonstrate that a combined model of face detection and low-level saliency clearly outperforms a low-level model in predicting locations humans fixate. This is reflected in our finding fact that observes, even when not instructed to look for anything particular, fixate on a face with a probability of over 80% within their first two fixations (500ms). Remarkably, the model's predictive performance in images that do not contain faces is not impaired by spurious face detector responses, which is suggestive of a bottom-up mechanism for face detection. In summary, we provide a novel computational approach which combines high level object knowledge (in our case: face locations) with low-level features to successfully predict the allocation of attentional resources.


A learning framework for nearest neighbor search

Neural Information Processing Systems

Can we leverage learning techniques to build a fast nearest-neighbor (NN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures.


Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks

Neural Information Processing Systems

We propose a model that leverages the millions of clicks received by web search engines, to predict document relevance. This allows the comparison of ranking functions when clicks are available but complete relevance judgments are not. After an initial training phase using a set of relevance judgments paired with click data, we show that our model can predict the relevance score of documents that have not been judged. These predictions can be used to evaluate the performance of a search engine, using our novel formalization of the confidence of the standard evaluation metric discounted cumulative gain (DCG), so comparisons can be made across time and datasets. This contrasts with previous methods which can provide only pair-wise relevance judgements between results shown for the same query. When no relevance judgments are available, we can identify the better of two ranked lists up to 82% of the time, and with only two relevance judgments for each query, we can identify the better ranking up to 94% of the time. While our experiments are on sponsored search results, which is the financial backbone of web search, our method is general enough to be applicable to algorithmic web search results as well. Furthermore, we give an algorithm to guide the selection of additional documents to judge to improve confidence.


Discriminative Keyword Selection Using Support Vector Machines

Neural Information Processing Systems

Many tasks in speech processing involve classification of long term characteristics of a speech segment such as language, speaker, dialect, or topic. A natural technique fordetermining these characteristics is to first convert the input speech into a sequence of tokens such as words, phones, etc. From these tokens, we can then look for distinctive sequences, keywords, that characterize the speech. In many applications, a set of distinctive keywords may not be known a priori. In this case, an automatic method of building up keywords from short context units such as phones is desirable. We propose a method for the construction of keywords based upon Support Vector Machines. We cast the problem of keyword selection as a feature selection problem for n-grams of phones. We propose an alternating filter-wrappermethod that builds successively longer keywords. Application of this method to language recognition and topic recognition tasks shows that the technique produces interesting and significant qualitative and quantitative results.


The Distribution Family of Similarity Distances

Neural Information Processing Systems

Assessing similarity between features is a key step in object recognition and scene categorization tasks. We argue that knowledge on the distribution of distances generated by similarity functions is crucial in deciding whether features are similar or not. Intuitively one would expect that similarities between features could arise from any distribution. In this paper, we will derive the contrary, and report the theoretical result that $L_p$-norms --a class of commonly applied distance metrics-- from one feature vector to other vectors are Weibull-distributed if the feature values are correlated and non-identically distributed. Besides these assumptions being realistic for images, we experimentally show them to hold for various popular feature extraction algorithms, for a diverse range of images. This fundamental insight opens new directions in the assessment of feature similarity, with projected improvements in object and scene recognition algorithms. Erratum: The authors of paper have declared that they have become convinced that the reasoning in the reference is too simple as a proof of their claims. As a consequence, they withdraw their theorems.


Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons

Neural Information Processing Systems

We show that under suitable assumptions (primarily linearization) a simple and perspicuous online learning rule for Information Bottleneck optimization with spiking neurons can be derived. This rule performs on common benchmark tasks as well as a rather complex rule that has previously been proposed \cite{KlampflETAL:07b}. Furthermore, the transparency of this new learning rule makes a theoretical analysis of its convergence properties feasible. A variation of this learning rule (with sign changes) provides a theoretically founded method for performing Principal Component Analysis {(PCA)} with spiking neurons. By applying this rule to an ensemble of neurons, different principal components of the input can be extracted. In addition, it is possible to preferentially extract those principal components from incoming signals $X$ that are related or are not related to some additional target signal $Y_T$. In a biological interpretation, this target signal $Y_T$ (also called relevance variable) could represent proprioceptive feedback, input from other sensory modalities, or top-down signals.


FilterBoost: Regression and Classification on Large Datasets

Neural Information Processing Systems

We study boosting in the filtering setting, where the booster draws examples from an oracle instead of using a fixed training set and so may train efficiently on very large datasets. Our algorithm, which is based on a logistic regression technique proposed by Collins, Schapire, & Singer, requires fewer assumptions to achieve bounds equivalent to or better than previous work. Moreover, we give the first proof that the algorithm of Collins et al. is a strong PAC learner, albeit within the filtering setting. Our proofs demonstrate the algorithm's strong theoretical properties forboth classification and conditional probability estimation, and we validate these results through extensive experiments. Empirically, our algorithm proves more robust to noise and overfitting than batch boosters in conditional probability estimation and proves competitive in classification.


A Probabilistic Approach to Language Change

Neural Information Processing Systems

We present a probabilistic approach to language change in which word forms are represented by phoneme sequences that undergo stochastic edits along the branches of a phylogenetic tree. Our framework combines the advantages of the classical comparative method with the robustness of corpus-based probabilistic models. We use this framework to explore the consequences of two different schemes for defining probabilistic models of phonological change, evaluating these schemes using the reconstruction of ancient word forms in Romance languages. The result is an efficient inference procedure for automatically inferring ancient word forms from modern languages, which can be generalized to support inferences about linguistic phylogenies.


The Tradeoffs of Large Scale Learning

Neural Information Processing Systems

This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms. The analysis shows distinct tradeoffs for the case of small-scale and large-scale learning problems. Small-scale learning problems are subject to the usual approximation--estimation tradeoff. Large-scale learning problems are subject to a qualitatively different tradeoff involving the computational complexity of the underlying optimization algorithms in non-trivial ways.


Multi-task Gaussian Process Prediction

Neural Information Processing Systems

In this paper we investigate multi-task learning in the context of Gaussian Processes (GP).We propose a model that learns a shared covariance function on input-dependent features and a "free-form" covariance matrix over tasks. This allows forgood flexibility when modelling inter-task dependencies while avoiding the need for large amounts of data for training. We show that under the assumption ofnoise-free observations and a block design, predictions for a given task only depend on its target values and therefore a cancellation of inter-task transfer occurs.We evaluate the benefits of our model on two practical applications: a compiler performance prediction problem and an exam score prediction task. Additionally, we make use of GP approximations and properties of our model in order to provide scalability to large data sets.