Statistical Learning
A Nonparametric Approach to Bottom-Up Visual Saliency
Kienzle, Wolf, Wichmann, Felix A., Franz, Matthias O., Schölkopf, Bernhard
This paper addresses the bottom-up influence of local image information on human eyemovements. Most existing computational models use a set of biologically plausiblelinear filters, e.g., Gabor or Difference-of-Gaussians filters as a front-end, the outputs of which are nonlinearly combined into a real number that indicates visual saliency. Unfortunately, this requires many design parameters such as the number, type, and size of the front-end filters, as well as the choice of nonlinearities, weighting and normalization schemes etc., for which biological plausibility cannot always be justified. As a result, these parameters have to be chosen in a more or less ad hoc way. Here, we propose to learn a visual saliency model directly from human eye movement data. The model is rather simplistic and essentially parameter-free, and therefore contrasts recent developments in the field that usually aim at higher prediction rates at the cost of additional parameters and increasing model complexity. Experimental results show that--despite the lack of any biological prior knowledge--our model performs comparably to existing approaches, andin fact learns image features that resemble findings from several previous studies. In particular, its maximally excitatory stimuli have center-surround structure, similar to receptive fields in the early human visual system.
Kernel Maximum Entropy Data Transformation and an Enhanced Spectral Clustering Algorithm
Jenssen, Robert, Eltoft, Torbjørn, Girolami, Mark, Erdogmus, Deniz
We propose a new kernel-based data transformation technique. It is founded on the principle of maximum entropy (MaxEnt) preservation, hence named kernel MaxEnt. The key measure is Renyi's entropy estimated via Parzen windowing. We show that kernel MaxEnt is based on eigenvectors, and is in that sense similar to kernel PCA, but may produce strikingly different transformed data sets. An enhanced spectral clustering algorithm is proposed, by replacing kernel PCA by kernel MaxEnt as an intermediate step. This has a major impact on performance.
Learning Time-Intensity Profiles of Human Activity using Non-Parametric Bayesian Models
Ihler, Alexander T., Smyth, Padhraic
Data sets that characterize human activity over time through collections of timestamped eventsor counts are of increasing interest in application areas as humancomputer interaction,video surveillance, and Web data analysis. We propose a nonparametric Bayesian framework for modeling collections of such data. In particular, we use a Dirichlet process framework for learning a set of intensity functions corresponding to different categories, which form a basis set for representing individualtime-periods (e.g., several days) depending on which categories the time-periods are assigned to. This allows the model to learn in a data-driven fashion what "factors" are generating the observations on a particular day, including (forexample) weekday versus weekend effects or day-specific effects corresponding tounique (single-day) occurrences of unusual behavior, sharing information where appropriate to obtain improved estimates of the behavior associated with each category. Applications to real-world data sets of count data involving both vehicles and people are used to illustrate the technique.
Sparse Representation for Signal Classification
In this paper, application of sparse representation (factorization) of signals over an overcomplete basis (dictionary) for signal classification is discussed. Searching forthe sparse representation of a signal over an overcomplete dictionary is achieved by optimizing an objective function that includes two terms: one that measures the signal reconstruction error and another that measures the sparsity. This objective function works well in applications where signals need to be reconstructed, likecoding and denoising. On the other hand, discriminative methods, such as linear discriminative analysis (LDA), are better suited for classification tasks. However, discriminative methods are usually sensitive to corruption in signals dueto lacking crucial properties for signal reconstruction. In this paper, we present a theoretical framework for signal classification with sparse representation. Theapproach combines the discrimination power of the discriminative methods with the reconstruction property and the sparsity of the sparse representation that enables one to deal with signal corruptions: noise, missing data and outliers. The proposed approach is therefore capable of robust classification with a sparse representation of signals. The theoretical results are demonstrated with signal classification tasks, showing that the proposed approach outperforms the standard discriminative methods and the standard sparse representation in the case of corrupted signals.
Correcting Sample Selection Bias by Unlabeled Data
Huang, Jiayuan, Gretton, Arthur, Borgwardt, Karsten M., Schölkopf, Bernhard, Smola, Alex J.
We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias. Most algorithms for this setting try to first recover sampling distributions and then make appropriate correctionsbased on the distribution estimate. We present a nonparametric method which directly produces resampling weights without distribution estimation. Ourmethod works by matching distributions between training and testing sets in feature space. Experimental results demonstrate that our method works well in practice.
Geometric entropy minimization (GEM) for anomaly detection and localization
We introduce a novel adaptive nonparametric anomaly detection approach, called GEM, that is based on the minimal covering properties of K-point entropic graphs when constructed on N training samples from a nominal probability distribution. Suchgraphs have the property that as N their span recovers the entropy minimizing set that supports at least ρ K/N(100)% of the mass of the Lebesgue part of the distribution. When a test sample falls outside of the entropy minimizing set an anomaly can be declared at a statistical level of significance α 1 ρ. A method for implementing this nonparametric anomaly detector is proposed that approximates this minimum entropy set by the influence region of a K-point entropic graph built on the training data. By implementing an incremental leave-one-out k-nearest neighbor graph on resampled subsets of the training data GEM can efficiently detect outliers at a given level of significance and compute their empirical p-values. We illustrate GEM for several simulated and real data sets in high dimensional feature spaces.
Prediction on a Graph with a Perceptron
Herbster, Mark, Pontil, Massimiliano
We study the problem of online prediction of a noisy labeling of a graph with the perceptron. We address both label noise and concept noise. Graph learning is framed as an instance of prediction on a finite set. To treat label noise we show that the hinge loss bounds derived by Gentile [1] for online perceptron learning can be transformed to relative mistake bounds with an optimal leading constant when applied to prediction on a finite set. These bounds depend crucially on the norm of the learned concept. Often the norm of a concept can vary dramatically with only small perturbations in a labeling. We analyze a simple transformation that stabilizes the norm under perturbations. We derive an upper bound that depends only on natural properties of the graph - the graph diameter and the cut size of a partitioning of the graph - which are only indirectly dependent on the size of the graph. The impossibility of such bounds for the graph geodesic nearest neighbors algorithm will be demonstrated.
Manifold Denoising
The presented denoising algorithm is based on a graph-based diffusion process of the point sample. We analyze this diffusion process using recent results aboutthe convergence of graph Laplacians. In the experiments we show that our method is capable of dealing with nontrivial high-dimensional noise. Moreover usingthe denoising algorithm as pre-processing method we can improve the results of a semi-supervised learning algorithm.