Country
Clustering with the Fisher Score
Tsuda, Koji, Kawanabe, Motoaki, Müller, Klaus-Robert
Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserved inthe space of the Fisher score, which turns out that the Fisher score consists of a few important dimensions with class information and many nuisance dimensions. When we perform clustering with the Fisher score, K-Means type methods are obviously inappropriate because they make use of all dimensions. So we will develop a novel but simple clustering algorithmspecialized for the Fisher score, which can exploit important dimensions.This algorithm is successfully tested in experiments with artificial data and real data (amino acid sequences).
Selectivity and Metaplasticity in a Unified Calcium-Dependent Model
Yeung, Luk Chong, Blais, Brian S., Cooper, Leon N., Shouval, Harel Z.
A unified, biophysically motivated Calcium-Dependent Learning model has been shown to account for various rate-based and spike time-dependent paradigms for inducing synaptic plasticity. Here, we investigate the properties of this model for a multi-synapse neuron that receives inputs with different spike-train statistics. In addition, we present a physiological form of metaplasticity, an activity-driven regulation mechanism, that is essential for the robustness ofthe model.
String Kernels, Fisher Kernels and Finite State Automata
Saunders, Craig, Vinokourov, Alexei, Shawe-taylor, John S.
In this paper we show how the generation of documents can be thought of as a k-stage Markov process, which leads to a Fisher kernel fromwhich the n-gram and string kernels can be reconstructed. The Fisher kernel view gives a more flexible insight into the string kernel and suggests how it can be parametrised in a way that reflects thestatistics of the training corpus. Furthermore, the probabilistic modellingapproach suggests extending the Markov process to consider subsequences of varying length, rather than the standard fixed-length approach used in the string kernel. We give a procedure for determining which subsequences are informative features and hence generate a Finite State Machine model, which can again be used to obtain a Fisher kernel. By adjusting the parametrisation we can also influence the weighting received by the features. In this way we are able to obtain a logarithmic weighting in a Fisher kernel. Finally, experiments are reported comparing the different kernels using the standard Bag of Words kernel as a baseline.
How the Poverty of the Stimulus Solves the Poverty of the Stimulus
Language acquisition is a special kind of learning problem because the outcome of learning of one generation is the input for the next. That makes it possible for languages to adapt to the particularities of the learner. In this paper, I show that this type of language change has important consequences for models of the evolution and acquisition of syntax. 1 The Language Acquisition Problem For both artificial systems and nonhuman animals, learning the syntax of natural languages is a notoriously hard problem. All healthy human infants, in contrast, learn any of the approximately 6000 human languages rapidly, accurately and spontaneously. Anyexplanation of how they accomplish this difficult task must specify the (innate) inductive bias that human infants bring to bear, and the input data that is available to them. Traditionally, the inductive bias is termed - somewhat unfortunately -"Universal Grammar", and the input data "primary linguistic data". Over the last 30 years or so, a view on the acquisition of the syntax of natural language has become popular that has put much emphasis on the innate machinery. In this view, that one can call the "Principles and Parameters" model, the Universal Grammar specifies most aspects of syntax in great detail [e.g.
The RA Scanner: Prediction of Rheumatoid Joint Inflammation Based on Laser Imaging
Schwaighofer, Anton, Tresp, Volker, Mayer, Peter, Scheel, Alexander K., Müller, Gerhard A.
We describe the RA scanner, a novel system for the examination of patients sufferingfrom rheumatoid arthritis. The RA scanner is based on a novel laser-based imaging technique which is sensitive to the optical characteristics of finger joint tissue. Based on the laser images, finger joints are classified according to whether the inflammatory status has improved or worsened. To perform the classification task, various linear andkernel-based systems were implemented and their performances were compared. Special emphasis was put on measures to reliably perform parametertuning and evaluation, since only a very small data set was available. Based on the results presented in this paper, it was concluded thatthe RA scanner permits a reliable classification of pathological finger joints, thus paving the way for a further development from prototype to product stage.
Learning in Spiking Neural Assemblies
We consider a statistical framework for learning in a class of networks ofspiking neurons. Our aim is to show how optimal local learning rules can be readily derived once the neural dynamics and desired functionality of the neural assembly have been specified, in contrast to other models which assume (sub-optimal) learning rules. Within this framework we derive local rules for learning temporal sequencesin a model of spiking neurons and demonstrate its superior performance to correlation (Hebbian) based approaches. We further show how to include mechanisms such as synaptic depression andoutline how the framework is readily extensible to learning in networks of highly complex spiking neurons. A stochastic quantalvesicle release mechanism is considered and implications on the complexity of learning discussed.
Learning Sparse Multiscale Image Representations
Sallee, Phil, Olshausen, Bruno A.
We describe a method for learning sparse multiscale image representations usinga sparse prior distribution over the basis function coefficients. The prior consists of a mixture of a Gaussian and a Dirac delta function, and thus encourages coefficients to have exact zero values. Coefficients for an image are computed by sampling from the resulting posterior distribution with a Gibbs sampler. The learned basis is similar to the Steerable Pyramid basis, and yields slightly higher SNR for the same number of active coefficients. Denoising usingthe learned image model is demonstrated for some standard test images, with results that compare favorably with other denoising methods.
Boosted Dyadic Kernel Discriminants
Moghaddam, Baback, Shakhnarovich, Gregory
We introduce a novel learning algorithm for binary classification with hyperplane discriminants based on pairs of training points from opposite classes (dyadic hypercuts). This algorithm is further extended to nonlinear discriminants using kernel functions satisfying Mercer'sconditions. An ensemble of simple dyadic hypercuts is learned incrementally by means of a confidence-rated version of AdaBoost, whichprovides a sound strategy for searching through the finite set of hypercut hypotheses. In experiments with real-world datasets from the UCI repository, the generalization performance of the hypercut classifiers was found to be comparable to that of SVMs and k-NN classifiers. Furthermore, the computational cost of classification (at run time) was found to be similar to, or better than,that of SVM. Similarly to SVMs, boosted dyadic kernel discriminants tend to maximize the margin (via AdaBoost). In contrast to SVMs, however, we offer an online and incremental learning machine for building kernel discriminants whose complexity (numberof kernel evaluations) can be directly controlled (traded off for accuracy).
Manifold Parzen Windows
Vincent, Pascal, Bengio, Yoshua
The similarity between objects is a fundamental element of many learning algorithms.Most nonparametric methods take this similarity to be fixed, but much recent work has shown the advantages of learning it, in particular to exploit the local invariances in the data or to capture the possibly nonlinear manifold on which most of the data lies. We propose a new nonparametric kernel density estimation method which captures the local structure of an underlying manifold through the leading eigenvectors ofregularized local covariance matrices.