Goto

Collaborating Authors

 Asia


Linear Hinge Loss and Average Margin

Neural Information Processing Systems

We describe a unifying method for proving relative loss bounds for online linear threshold classification algorithms, such as the Perceptron and the Winnow algorithms. For classification problems the discrete loss is used, i.e., the total number of prediction mistakes. We introduce a continuous loss function, called the "linear hinge loss", that can be employed to derive the updates of the algorithms. We first prove bounds w.r.t. the linear hinge loss and then convert them to the discrete loss. We introduce a notion of "average margin" of a set of examples. We show how relative loss bounds based on the linear hinge loss can be converted to relative loss bounds i.t.o. the discrete loss using the average margin.


Finite-Dimensional Approximation of Gaussian Processes

Neural Information Processing Systems

Gaussian process (GP) prediction suffers from O(n3) scaling with the data set size n. By using a finite-dimensional basis to approximate the GP predictor, the computational complexity can be reduced. We derive optimal finite-dimensional predictors under a number of assumptions, and show the superiority of these predictors over the Projected Bayes Regression method (which is asymptotically optimal). We also show how to calculate the minimal model size for a given n. The calculations are backed up by numerical experiments.


Dynamics of Supervised Learning with Restricted Training Sets

Neural Information Processing Systems

We study the dynamics of supervised learning in layered neural networks, in the regime where the size p of the training set is proportional to the number N of inputs. Here the local fields are no longer described by Gaussian distributions.


Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks

Neural Information Processing Systems

We compute upper and lower bounds on the VC dimension of feedforward networks of units with piecewise polynomial activation functions. We show that if the number of layers is fixed, then the VC dimension grows as W log W, where W is the number of parameters in the network. The VC dimension is an important measure of the complexity of a class of binaryvalued functions, since it characterizes the amount of data required for learning in the PAC setting (see [BEHW89, Vap82]). In this paper, we establish upper and lower bounds on the VC dimension of a specific class of multi-layered feedforward neural networks. Let F be the class of binary-valued functions computed by a feed forward neural network with W weights and k computational (non-input) units, each with a piecewise polynomial activation function.


Tractable Variational Structures for Approximating Graphical Models

Neural Information Processing Systems

Graphical models provide a broad probabilistic framework with applications in speech recognition (Hidden Markov Models), medical diagnosis (Belief networks) and artificial intelligence (Boltzmann Machines). However, the computing time is typically exponential in the number of nodes in the graph. Within the variational framework for approximating these models, we present two classes of distributions, decimatable Boltzmann Machines and Tractable Belief Networks that go beyond the standard factorized approach. We give generalised mean-field equations for both these directed and undirected approximations. Simulation results on a small benchmark problem suggest using these richer approximations compares favorably against others previously reported in the literature. 1 Introduction Graphical models provide a powerful framework for probabilistic inference[l] but suffer intractability when applied to large scale problems.


Multi-Electrode Spike Sorting by Clustering Transfer Functions

Neural Information Processing Systems

Since every electrode is in a different position it will measure a different contribution from each of the different neurons. Simply stated, the problem is this: how can these complex signals be untangled to determine when each individual cell fired? This problem is difficult because, a) the objects being classified are very similar and often noisy, b) spikes coming from the same cell can ·Permanent address: Institute of Computer Science and Center for Neural Computation, The Hebrew University, Jerusalem, Israel.


Signal Detection in Noisy Weakly-Active Dendrites

Neural Information Processing Systems

Here we derive measures quantifying the information loss of a synaptic signal due to the presence of neuronal noise sources, as it electrotonically propagates along a weakly-active dendrite. We model the dendrite as an infinite linear cable, with noise sources distributed along its length. The noise sources we consider are thermal noise, channel noise arising from the stochastic nature of voltage-dependent ionic channels (K and Na) and synaptic noise due to spontaneous background activity. We assess the efficacy of information transfer using a signal detection paradigm where the objective is to detect the presence/absence of a presynaptic spike from the post-synaptic membrane voltage. This allows us to analytically assess the role of each of these noise sources in information transfer. For our choice of parameters, we find that the synaptic noise is the dominant noise source which limits the maximum length over which information be reliably transmitted. 1 Introduction This is a continuation of our efforts (Manwani and Koch, 1998) to understand the information capacity ofa neuronal link (in terms of the specific nature of neural "hardware") by a systematic study of information processing at different biophysical stages in a model of a single neuron. Here we investigate how the presence of neuronal noise sources influences the information transmission capabilities of a simplified model of a weakly-active dendrite. The noise sources we include are, thermal noise, channel noise arising from the stochastic nature of voltage-dependent channels (K and Na) and synaptic noise due to spontaneous background activity. We characterize the noise sources using analytical expressions of their current power spectral densities and compare their magnitudes for dendritic parameters reported in literature (Mainen and Sejnowski, 1998).


Analyzing and Visualizing Single-Trial Event-Related Potentials

Neural Information Processing Systems

Event-related potentials (ERPs), are portions of electroencephalographic (EEG) recordings that are both time-and phase-locked to experimental events. ERPs are usually averaged to increase their signal/noise ratio relative to non-phase locked EEG activity, regardless of the fact that response activity in single epochs may vary widely in time course and scalp distribution. This study applies a linear decomposition tool, Independent Component Analysis (ICA) [1], to multichannel single-trial EEG records to derive spatial filters that decompose single-trial EEG epochs into a sum of temporally independent and spatially fixed components arising from distinct or overlapping brain or extra-brain networks. Our results on normal and autistic subjects show that ICA can separate artifactual, stimulus-locked, response-locked, and.


Synergy and Redundancy among Brain Cells of Behaving Monkeys

Neural Information Processing Systems

Determining the relationship between the activity of a single nerve cell to that of an entire population is a fundamental question that bears on the basic neural computation paradigms. In this paper we apply an information theoretic approach to quantify the level of cooperative activity among cells in a behavioral context. It is possible to discriminate between synergetic activity of the cells vs. redundant activity, depending on the difference between the information they provide when measured jointly and the information they provide independently. We define a synergy value that is positive in the first case and negative in the second and show that the synergy value can be measured by detecting the behavioral mode of the animal from simultaneously recorded activity of the cells. We observe that among cortical cells positive synergy can be found, while cells from the basal ganglia, active during the same task, do not exhibit similar synergetic activity.