Country
Bayesian Modeling of Human Concept Learning
I consider the problem of learning concepts from small numbers of positive examples,a feat which humans perform routinely but which computers arerarely capable of. Bridging machine learning and cognitive science perspectives, I present both theoretical analysis and an empirical study with human subjects for the simple task oflearning concepts corresponding toaxis-aligned rectangles in a multidimensional feature space. Existing learning models, when applied to this task, cannot explain how subjects generalize from only a few examples of the concept. I propose a principled Bayesian model based on the assumption that the examples are a random sample from the concept to be learned. The model gives precise fits to human behavior on this simple task and provides qualitati ve insights into more complex, realistic cases of concept learning.
Spike-Based Compared to Rate-Based Hebbian Learning
Kempter, Richard, Gerstner, Wulfram, Hemmen, J. Leo van
For example, a'Hebbian' (Hebb 1949) learning rule which is driven by the correlations between presynaptic and postsynaptic rates may be used to generate neuronal receptive fields (e.g., Linsker 1986, MacKay and Miller 1990, Wimbauer et al. 1997) with properties similar to those of real neurons. A rate-based description, however, neglects effects which are due to the pulse structure of neuronal signals.
Orientation, Scale, and Discontinuity as Emergent Properties of Illusory Contour Shape
Thornber, Karvel K., Williams, Lance R.
A recent neural model of illusory contour formation is based on a distribution of natural shapes traced by particles moving with constant speed in directions given by Brownian motions. The input to that model consists of pairs of position and direction constraints and the output consists of the distribution of contours joining all such pairs. In general, these contours will not be closed and their distribution will not be scale-invariant. In this paper, we show how to compute a scale-invariant distribution of closed contours given position constraints alone and use this result to explain a well known illusory contour effect. 1 INTRODUCTION It has been proposed by Mumford[3] that the distribution of illusory contour shapes can be modeled by particles travelling with constant speed in directions given by Brownian motions. More recently, Williams and Jacobs[7, 8] introduced the notion of a stochastic completion field, the distribution of particle trajectories joining pairs of position and direction constraints, and showed how it could be computed in a local parallel network.
Direct Optimization of Margins Improves Generalization in Combined Classifiers
Mason, Llew, Bartlett, Peter L., Baxter, Jonathan
The dark curve is AdaBoost, the light curve is DOOM. DOOM sacrifices significant training error forimproved test error (horizontal markson margin 0 line)_ 1 Introduction Many learning algorithms for pattern classification minimize some cost function of the training data, with the aim of minimizing error (the probability of misclassifying an example). One example of such a cost function is simply the classifier's error on the training data.
General Bounds on Bayes Errors for Regression with Gaussian Processes
Opper, Manfred, Vivarelli, Francesco
Based on a simple convexity lemma, we develop bounds for different typesof Bayesian prediction errors for regression with Gaussian processes. The basic bounds are formulated for a fixed training set. Simpler expressions are obtained for sampling from an input distribution whichequals the weight function of the covariance kernel, yielding asymptotically tight results. The results are compared with numerical experiments.
Support Vector Machines Applied to Face Recognition
On the other hand, in 804 P.J Phillips face recognition, there are many individuals (classes), and only a few images (samples) per person, and algorithms must recognize faces by extrapolating from the training samples. In numerous applications there can be only one training sample (image) of each person. Support vector machines (SVMs) are formulated to solve a classical two class pattern recognition problem. We adapt SVM to face recognition by modifying the interpretation of the output of a SVM classifier and devising a representation of facial images that is concordant with a two class problem. Traditional SVM returns a binary value, the class of the object.
A Randomized Algorithm for Pairwise Clustering
Gdalyahu, Yoram, Weinshall, Daphna, Werman, Michael
We present a stochastic clustering algorithm based on pairwise similarity ofdatapoints. Our method extends existing deterministic methods, including agglomerative algorithms, min-cut graph algorithms, andconnected components. Thus it provides a common framework for all these methods. Our graph-based method differs from existing stochastic methods which are based on analogy to physical systems. The stochastic nature of our method makes it more robust against noise, including accidental edges and small spurious clusters. We demonstrate the superiority of our algorithm using an example with 3 spiraling bands and a lot of noise. 1 Introduction Clustering algorithms can be divided into two categories: those that require a vectorial representationof the data, and those which use only pairwise representation. In the former case, every data item must be represented as a vector in a real normed space, while in the second case only pairwise relations of similarity or dissimilarity areused.
Kernel PCA and De-Noising in Feature Spaces
Mika, Sebastian, Schölkopf, Bernhard, Smola, Alex J., Müller, Klaus-Robert, Scholz, Matthias, Rätsch, Gunnar
Kernel PCA as a nonlinear feature extractor has proven powerful as a preprocessing step for classification algorithms. But it can also be considered as a natural generalization of linear principal component analysis. This gives rise to the question how to use nonlinear features for data compression, reconstruction, and de-noising, applications common in linear PCA. This is a nontrivial task, as the results provided by kernel PCA live in some high dimensional feature space and need not have pre-images in input space. This work presents ideas for finding approximate pre-images, focusing on Gaussian kernels, and shows experimental results using these pre-images in data reconstruction and de-noising on toy examples as well as on real world data.