Europe
Reducing Spike Train Variability: A Computational Theory Of Spike-Timing Dependent Plasticity
Bohte, Sander M., Mozer, Michael C.
Experimental studies have observed synaptic potentiation when a presynaptic neuron fires shortly before a postsynaptic neuron, and synaptic depression when the presynaptic neuron fires shortly after. Thedependence of synaptic modulation on the precise timing of the two action potentials is known as spike-timing dependent plasticityor STDP. We derive STDP from a simple computational principle:synapses adapt so as to minimize the postsynaptic neuron's variability to a given presynaptic input, causing the neuron's output to become more reliable in the face of noise. Using an entropy-minimization objective function and the biophysically realisticspike-response model of Gerstner (2001), we simulate neurophysiological experiments and obtain the characteristic STDP curve along with other phenomena including the reduction in synaptic plasticity as synaptic efficacy increases. We compare our account to other efforts to derive STDP from computational principles, andargue that our account provides the most comprehensive coverage of the phenomena. Thus, reliability of neural response in the face of noise may be a key goal of cortical adaptation.
Exponentiated Gradient Algorithms for Large-margin Structured Classification
Bartlett, Peter L., Collins, Michael, Taskar, Ben, McAllester, David A.
We consider the problem of structured classification, where the task is to predict a label y from an input x, and y has meaningful internal structure. Ourframework includes supervised training of Markov random fields and weighted context-free grammars as special cases. We describe an algorithm that solves the large-margin optimization problem defined in [12], using an exponential-family (Gibbs distribution) representation of structured objects. The algorithm is efficient--even in cases where the number of labels y is exponential in size--provided that certain expectations underGibbs distributions can be calculated efficiently. The method for structured labels relies on a more general result, specifically the application ofexponentiated gradient updates [7, 8] to quadratic programs.
A Hidden Markov Model for de Novo Peptide Sequencing
Fischer, Bernd, Roth, Volker, Grossmann, Jonas, Baginsky, Sacha, Gruissem, Wilhelm, Roos, Franz, Widmayer, Peter, Buhmann, Joachim M.
De novo Sequencing of peptides is a challenging task in proteome research. Whilethere exist reliable DNAsequencing methods, the highthroughput denovo sequencing of proteins by mass spectrometry is still an open problem. Current approaches suffer from a lack in precision to detect mass peaks in the spectrograms. In this paper we present a novel method for de novo peptide sequencing based on a hidden Markov model. Experiments effectively demonstrate that this new method significantly outperformsstandard approaches in matching quality.
Learning Gaussian Process Kernels via Hierarchical Bayes
Schwaighofer, Anton, Tresp, Volker, Yu, Kai
We present a novel method for learning with Gaussian process regression ina hierarchical Bayesian framework. In a first step, kernel matrices on a fixed set of input points are learned from data using a simple and efficient EM algorithm. This step is nonparametric, in that it does not require a parametric form of covariance function. In a second step, kernel functions are fitted to approximate the learned covariance matrix using a generalized Nyström method, which results in a complex, data driven kernel. We evaluate our approach as a recommendation engine for art images, where the proposed hierarchical Bayesian method leads to excellent prediction performance.
Edge of Chaos Computation in Mixed-Mode VLSI - A Hard Liquid
Schürmann, Felix, Meier, Karlheinz, Schemmel, Johannes
Computation without stable states is a computing paradigm different fromTuring's and has been demonstrated for various types of simulated neural networks. This publication transfers this to a hardware implemented neural network. Results of a software implementation arereproduced showing that the performance peaks when the network exhibits dynamics at the edge of chaos. The liquid computing approach seems well suited for operating analog computing devices such as the used VLSI neural network.
A Topographic Support Vector Machine: Classification Using Local Label Configurations
Mohr, Johannes, Obermayer, Klaus
The standard approach to the classification of objects is to consider the examples as independent and identically distributed (iid). In many real world settings, however, this assumption is not valid, because a topographical relationshipexists between the objects. In this contribution we consider the special case of image segmentation, where the objects are pixels and where the underlying topography is a 2D regular rectangular grid. We introduce a classification method which not only uses measured vectorial feature information but also the label configuration within a topographic neighborhood.Due to the resulting dependence between the labels of neighboring pixels, a collective classification of a set of pixels becomes necessary. We propose a new method called'Topographic Support VectorMachine' (TSVM), which is based on a topographic kernel and a self-consistent solution to the label assignment shown to be equivalent toa recurrent neural network. The performance of the algorithm is compared to a conventional SVM on a cell image segmentation task.
Parallel Support Vector Machines: The Cascade SVM
Graf, Hans P., Cosatto, Eric, Bottou, Léon, Dourdanovic, Igor, Vapnik, Vladimir
We describe an algorithm for support vector machines (SVM) that can be parallelized efficiently and scales to very large problems with hundreds of thousands of training vectors. Instead of analyzing the whole training set in one optimization step, the data are split into subsets and optimized separately with multiple SVMs. The partial results are combined and filtered again in a'Cascade' of SVMs, until the global optimum is reached. The Cascade SVM can be spread over multiple processors with minimal communication overhead and requires far less memory, since the kernel matrices are much smaller than for a regular SVM. Convergence to the global optimum is guaranteed with multiple passes through the Cascade, but already a single pass provides good generalization. A single pass is 5x - 10x faster than a regular SVM for problems of 100,000 vectors when implemented on a single processor. Parallel implementations on a cluster of 16 processors were tested with over 1 million vectors (2-class problems), converging in a day or two, while a regular SVM never converged in over a week.
Breaking SVM Complexity with Cross-Training
Bottou, Léon, Weston, Jason, Bakir, Gökhan H.
We propose to selectively remove examples from the training set using probabilistic estimates related to editing algorithms (Devijver and Kittler, 1982). This heuristic procedure aims at creating a separable distribution of training examples with minimal impact on the position of the decision boundary. It breaks the linear dependency between the number of SVs and the number of training examples, and sharply reduces the complexity of SVMs during both the training and prediction stages.