Statistical Learning
The Stability of Kernel Principal Components Analysis and its Relation to the Process Eigenspectrum
Williams, Christopher, Shawe-taylor, John S.
I. Williams School of Informatics University of Edinburgh c.k.i.williams ed.ac.uk Abstract In this paper we analyze the relationships between the eigenvalues of the m x m Gram matrix K for a kernel k(ยท, .) We bound the differences betweenthe two spectra and provide a performance bound on kernel peA. 1 Introduction Over recent years there has been a considerable amount of interest in kernel methods for supervised learning (e.g. Support Vector Machines and Gaussian Process predict ion)and for unsupervised learning (e.g. In this paper we study the stability of the subspace of feature space extracted by kernel peA with respect to the sample of size m, and relate this to the feature space that would be extracted in the infinite sample-size limit. This analysis essentially "lifts" into (a potentially infinite dimensional) feature space an analysis which can also be carried out for peA, comparing the k-dimensional eigenspace extracted from a sample covariance matrix and the k-dimensional eigenspace extracted from the population covariance matrix, and comparing the residuals from the k-dimensional compression for the m-sample and the population.
Dyadic Classification Trees via Structural Risk Minimization
Classification trees are one of the most popular types of classifiers, with ease of implementation and interpretation being among their attractive features. Despite the widespread use of classification trees, theoretical analysis of their performance is scarce. In this paper, we show that a new family of classification trees, called dyadic classification trees (DCTs), are near optimal (in a minimax sense) for a very broad range of classification problems.This demonstrates that other schemes (e.g., neural networks, support vector machines) cannot perform significantly better than DCTs in many cases. We also show that this near optimal performance isattained with linear (in the number of training data) complexity growing and pruning algorithms. Moreover, the performance of DCTs on benchmark datasets compares favorably to that of standard CART, which is generally more computationally intensive and which does not possess similar near optimality properties. Our analysis stems from theoretical resultson structural risk minimization, on which the pruning rule for DCTs is based.
Kernel-Based Extraction of Slow Features: Complex Cells Learn Disparity and Translation Invariance from Natural Images
Bray, Alistair, Martinez, Dominique
In Slow Feature Analysis (SFA [1]), it has been demonstrated that high-order invariant properties can be extracted by projecting inputs intoa nonlinear space and computing the slowest changing features in this space; this has been proposed as a simple general model for learning nonlinear invariances in the visual system. However, thismethod is highly constrained by the curse of dimensionality which limits it to simple theoretical simulations. This paper demonstrates that by using a different but closely-related objective function for extracting slowly varying features ([2, 3]), and then exploiting thekernel trick, this curse can be avoided. Using this new method we show that both the complex cell properties of translation invarianceand disparity coding can be learnt simultaneously from natural images when complex cells are driven by simple cells also learnt from the image. The notion of maximising an objective function based upon the temporal predictability ofoutput has been progressively applied in modelling the development of invariances in the visual system.
Spikernels: Embedding Spiking Neurons in Inner-Product Spaces
Shpigelman, Lavi, Singer, Yoram, Paz, Rony, Vaadia, Eilon
Inner-product operators, often referred to as kernels in statistical learning, define amapping from some input space into a feature space. The focus of this paper is the construction of biologically-motivated kernels for cortical activities. Thekernels we derive, termed Spikernels, map spike count sequences into an abstract vector space in which we can perform various prediction tasks. We discuss in detail the derivation of Spikernels and describe an efficient algorithm forcomputing their value on any two sequences of neural population spike counts. We demonstrate the merits of our modeling approach using the Spikernel and various standard kernels for the task of predicting hand movement velocitiesfrom cortical recordings. In all of our experiments all the kernels we tested outperform the standard scalar product used in regression with the Spikernel consistently achieving the best performance.
How Linear are Auditory Cortical Responses?
Sahani, Maneesh, Linden, Jennifer F.
By comparison to some other sensory cortices, the functional properties ofcells in the primary auditory cortex are not yet well understood. Recent attempts to obtain a generalized description of auditory cortical responses have often relied upon characterization of the spectrotemporal receptivefield (STRF), which amounts to a model of the stimulusresponse function(SRF) that is linear in the spectrogram of the stimulus.
Bayesian Models of Inductive Generalization
Sanjana, Neville E., Tenenbaum, Joshua B.
We argue that human inductive generalization is best explained in a Bayesian framework, rather than by traditional models based on similarity computations.We go beyond previous work on Bayesian concept learning by introducing an unsupervised method for constructing flexible hypothesisspaces, and we propose a version of the Bayesian Occam's razorthat trades off priors and likelihoods to prevent under-or over-generalization in these flexible spaces. We analyze two published data sets on inductive reasoning as well as the results of a new behavioral study that we have carried out.
Monte Carlo Methods for Tempo Tracking and Rhythm Quantization
We present a probabilistic generative model for timing deviations in expressive music performance. The structure of the proposed model is equivalent to a switching state space model. The switch variables correspond to discrete note locations as in a musical score. The continuous hidden variables denote the tempo. We formulate two well known music recognition problems, namely tempo tracking and automatic transcription (rhythm quantization) as filtering and maximum a posteriori (MAP) state estimation tasks. Exact computation of posterior features such as the MAP state is intractable in this model class, so we introduce Monte Carlo methods for integration and optimization. We compare Markov Chain Monte Carlo (MCMC) methods (such as Gibbs sampling, simulated annealing and iterative improvement) and sequential Monte Carlo methods (particle filters). Our simulation results suggest better results with sequential methods. The methods can be applied in both online and batch scenarios such as tempo tracking and transcription and are thus potentially useful in a number of music applications such as adaptive automatic accompaniment, score typesetting and music information retrieval.