Goto

Collaborating Authors

 Technology


Kernel Maximum Entropy Data Transformation and an Enhanced Spectral Clustering Algorithm

Neural Information Processing Systems

We propose a new kernel-based data transformation technique. It is founded on the principle of maximum entropy (MaxEnt) preservation, hence named kernel MaxEnt. The key measure is Renyi's entropy estimated via Parzen windowing. We show that kernel MaxEnt is based on eigenvectors, and is in that sense similar to kernel PCA, but may produce strikingly different transformed data sets. An enhanced spectral clustering algorithm is proposed, by replacing kernel PCA by kernel MaxEnt as an intermediate step. This has a major impact on performance.


Learning Time-Intensity Profiles of Human Activity using Non-Parametric Bayesian Models

Neural Information Processing Systems

Data sets that characterize human activity over time through collections of timestamped eventsor counts are of increasing interest in application areas as humancomputer interaction,video surveillance, and Web data analysis. We propose a nonparametric Bayesian framework for modeling collections of such data. In particular, we use a Dirichlet process framework for learning a set of intensity functions corresponding to different categories, which form a basis set for representing individualtime-periods (e.g., several days) depending on which categories the time-periods are assigned to. This allows the model to learn in a data-driven fashion what "factors" are generating the observations on a particular day, including (forexample) weekday versus weekend effects or day-specific effects corresponding tounique (single-day) occurrences of unusual behavior, sharing information where appropriate to obtain improved estimates of the behavior associated with each category. Applications to real-world data sets of count data involving both vehicles and people are used to illustrate the technique.


Sparse Representation for Signal Classification

Neural Information Processing Systems

In this paper, application of sparse representation (factorization) of signals over an overcomplete basis (dictionary) for signal classification is discussed. Searching forthe sparse representation of a signal over an overcomplete dictionary is achieved by optimizing an objective function that includes two terms: one that measures the signal reconstruction error and another that measures the sparsity. This objective function works well in applications where signals need to be reconstructed, likecoding and denoising. On the other hand, discriminative methods, such as linear discriminative analysis (LDA), are better suited for classification tasks. However, discriminative methods are usually sensitive to corruption in signals dueto lacking crucial properties for signal reconstruction. In this paper, we present a theoretical framework for signal classification with sparse representation. Theapproach combines the discrimination power of the discriminative methods with the reconstruction property and the sparsity of the sparse representation that enables one to deal with signal corruptions: noise, missing data and outliers. The proposed approach is therefore capable of robust classification with a sparse representation of signals. The theoretical results are demonstrated with signal classification tasks, showing that the proposed approach outperforms the standard discriminative methods and the standard sparse representation in the case of corrupted signals.


Correcting Sample Selection Bias by Unlabeled Data

Neural Information Processing Systems

We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias. Most algorithms for this setting try to first recover sampling distributions and then make appropriate correctionsbased on the distribution estimate. We present a nonparametric method which directly produces resampling weights without distribution estimation. Ourmethod works by matching distributions between training and testing sets in feature space. Experimental results demonstrate that our method works well in practice.



Geometric entropy minimization (GEM) for anomaly detection and localization

Neural Information Processing Systems

We introduce a novel adaptive nonparametric anomaly detection approach, called GEM, that is based on the minimal covering properties of K-point entropic graphs when constructed on N training samples from a nominal probability distribution. Suchgraphs have the property that as N their span recovers the entropy minimizing set that supports at least ρ K/N(100)% of the mass of the Lebesgue part of the distribution. When a test sample falls outside of the entropy minimizing set an anomaly can be declared at a statistical level of significance α 1 ρ. A method for implementing this nonparametric anomaly detector is proposed that approximates this minimum entropy set by the influence region of a K-point entropic graph built on the training data. By implementing an incremental leave-one-out k-nearest neighbor graph on resampled subsets of the training data GEM can efficiently detect outliers at a given level of significance and compute their empirical p-values. We illustrate GEM for several simulated and real data sets in high dimensional feature spaces.


Prediction on a Graph with a Perceptron

Neural Information Processing Systems

We study the problem of online prediction of a noisy labeling of a graph with the perceptron. We address both label noise and concept noise. Graph learning is framed as an instance of prediction on a finite set. To treat label noise we show that the hinge loss bounds derived by Gentile [1] for online perceptron learning can be transformed to relative mistake bounds with an optimal leading constant when applied to prediction on a finite set. These bounds depend crucially on the norm of the learned concept. Often the norm of a concept can vary dramatically with only small perturbations in a labeling. We analyze a simple transformation that stabilizes the norm under perturbations. We derive an upper bound that depends only on natural properties of the graph - the graph diameter and the cut size of a partitioning of the graph - which are only indirectly dependent on the size of the graph. The impossibility of such bounds for the graph geodesic nearest neighbors algorithm will be demonstrated.


Manifold Denoising

Neural Information Processing Systems

The presented denoising algorithm is based on a graph-based diffusion process of the point sample. We analyze this diffusion process using recent results aboutthe convergence of graph Laplacians. In the experiments we show that our method is capable of dealing with nontrivial high-dimensional noise. Moreover usingthe denoising algorithm as pre-processing method we can improve the results of a semi-supervised learning algorithm.


Graph-Based Visual Saliency

Neural Information Processing Systems

A new bottom-up visual saliency model, Graph-Based Visual Saliency (GBVS), is proposed. It consists of two steps: rst forming activation maps on certain feature channels, and then normalizing them in a way which highlights conspicuity and admits combination with other maps. The model is simple, and biologically plausible insofaras it is naturally parallelized. This model powerfully predicts human xations on 749 variations of 108 natural images, achieving 98% of the ROC area of a human-based control, whereas the classical algorithms of Itti & Koch ([2], [3], [4]) achieve only 84%.


Learning Nonparametric Models for Probabilistic Imitation

Neural Information Processing Systems

Learning by imitation represents an important mechanism for rapid acquisition of new behaviors in humans and robots. A critical requirement for learning by imitation is the ability to handle uncertainty arising from the observation process as well as the imitator's own dynamics and interactions with the environment. In this paper, we present a new probabilistic method for inferring imitative actions that takes into account both the observations of the teacher as well as the imitator's dynamics. Our key contribution is a nonparametric learning method which generalizes to systems with very different dynamics. Rather than relying on a known forward model of the dynamics, our approach learns a nonparametric forward model via exploration. Leveraging advances in approximate inference in graphical models, we show how the learned forward model can be directly used to plan an imitating sequence. We provide experimental results for two systems: a biome-chanical model of the human arm and a 25-degrees-of-freedom humanoid robot. We demonstrate that the proposed method can be used to learn appropriate motor inputs to the model arm which imitates the desired movements. A second set of results demonstrates dynamically stable full-body imitation of a human teacher by the humanoid robot.