Goto

Collaborating Authors

 Technology


The Effect of Correlated Input Data on the Dynamics of Learning

Neural Information Processing Systems

The convergence properties of the gradient descent algorithm in the case of the linear perceptron may be obtained from the response function. We derive a general expression for the response function and apply it to the case of data with simple input correlations. It is found that correlations severely may slow down learning. This explains the success of PCA as a method for reducing training time. Motivated by this finding we furthermore propose to transform the input data by removing the mean across input variables as well as examples to decrease correlations. Numerical findings for a medical classification problem are in fine agreement with the theoretical results.


Size of Multilayer Networks for Exact Learning: Analytic Approach

Neural Information Processing Systems

This article presents a new result about the size of a multilayer neural network computing real outputs for exact learning of a finite set of real samples. The architecture of the network is feedforward, with one hidden layer and several outputs. Starting from a fixed training set, we consider the network as a function of its weights. We derive, for a wide family of transfer functions, a lower and an upper bound on the number of hidden units for exact learning, given the size of the dataset and the dimensions of the input and output spaces. The context of our work is rather similar to the well-known results of Baum et al. [1, 2,3,5, 10], but we consider both real inputs and outputs, instead ofthe dichotomies usually addressed.


Support Vector Regression Machines

Neural Information Processing Systems

A new regression technique based on Vapnik's concept of support vectors is introduced. We compare support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space. On the basis of these experiments, it is expected that SVR will have advantages in high dimensionality space because SVR optimization does not depend on the dimensionality of the input space.


Multilayer Neural Networks: One or Two Hidden Layers?

Neural Information Processing Systems

In dimension d 2, Gibson characterized the functions computable with just one hidden layer, under the assumption that there is no "multiple intersection point" and that f


Dynamics of Training

Neural Information Processing Systems

A new method to calculate the full training process of a neural network is introduced. No sophisticated methods like the replica trick are used. The results are directly related to the actual number of training steps. Some results are presented here, like the maximal learning rate, an exact description of early stopping, and the necessary number of training steps. Further problems can be addressed with this approach.


For Valid Generalization the Size of the Weights is More Important than the Size of the Network

Neural Information Processing Systems

Baum and Haussler [4] used these results to give sample size bounds for multi-layer threshold networks Generalization and the Size of the Weights in Neural Networks 135 that grow at least as quickly as the number of weights (see also [7]). However, for pattern classification applications the VC-bounds seem loose; neural networks often perform successfully with training sets that are considerably smaller than the number of weights. This paper shows that for classification problems on which neural networks perform well, if the weights are not too big, the size of the weights determines the generalization performance. In contrast with the function classes and algorithms considered in the VC-theory, neural networks used for binary classification problems have real-valued outputs, and learning algorithms typically attempt to minimize the squared error of the network output over a training set. As well as encouraging the correct classification, this tends to push the output away from zero and towards the target values of { -1, I}.


Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient

Neural Information Processing Systems

The parameter space of neural networks has a Riemannian metric structure. The natural Riemannian gradient should be used instead of the conventional gradient, since the former denotes the true steepest descent direction of a loss function in the Riemannian space. The behavior of the stochastic gradient learning algorithm is much more effective if the natural gradient is used. The present paper studies the information-geometrical structure of perceptrons and other networks, and prove that the online learning method based on the natural gradient is asymptotically as efficient as the optimal batch algorithm. Adaptive modification of the learning constant is proposed and analyzed in terms of the Riemannian measure and is shown to be efficient.


A Model of Recurrent Interactions in Primary Visual Cortex

Neural Information Processing Systems

A general feature of the cerebral cortex is its massive interconnectivity - it has been estimated anatomically [19] that cortical neurons receive upwards of 5,000 synapses, the majority of which originate from other nearby cortical neurons. Numerous experiments in primary visual cortex (VI) have revealed strongly nonlinear interactions between stimulus elements which activate classical and nonclassical receptive field regions. Recurrent cortical connections likely contribute substantially to these effects. However, most theories of visual processing have either assumed a feedforward processing scheme [7], or have used recurrent interactions to account for isolated effects only [1, 16, 18]. Since nonlinear systems cannot in general be taken apart and analyzed in pieces, it is not clear what one learns by building a recurrent model that only accounts for one, or very few phenomena. Here we develop a relatively simple model of recurrent interactions in VI, that reflects major anatomical and physiological features of intracortical connectivity, and simultaneously accounts for a wide range of phenomena observed physiologically. All phenomena we address are strongly nonlinear, and cannot be explained by linear feedforward models.


Cholinergic Modulation Preserves Spike Timing Under Physiologically Realistic Fluctuating Input

Neural Information Processing Systems

Recently, there has been a vigorous debate concerning the nature of neural coding (Rieke et al. 1996; Stevens and Zador 1995; Shadlen and Newsome 1994). The prevailing view has been that the mean firing rate conveys all information about the sensory stimulus in a spike train and the precise timing of the individual spikes is noise. This belief is, in part, based on a lack of correlation between the precise timing of the spikes and the sensory qualities of the stimulus under study, particularly, on a lack of spike timing repeatability when identical stimulation is delivered. This view has been challenged by a number of recent studies, in which highly repeatable temporal patterns of spikes can be observed both in vivo (Bair and Koch 1996; Abeles et al. 1993) and in vitro (Mainen and Sejnowski 1994). Furthermore, application of information theory to the coding problem in the frog and house fly (Bialek et al. 1991; Bialek and Rieke 1992) suggested that additional information could be extracted from spike timing. In the absence of direct evidence for a timing code in the cerebral cortex, the role of spike timing in neural coding remains controversial.


An Architectural Mechanism for Direction-tuned Cortical Simple Cells: The Role of Mutual Inhibition

Neural Information Processing Systems

A linear architectural model of cortical simple cells is presented. The model evidences how mutual inhibition, occurring through synaptic coupling functions asymmetrically distributed in space, can be a possible basis for a wide variety of spatiotemporal simple cell response properties, including direction selectivity and velocity tuning. While spatial asymmetries are included explicitly in the structure of the inhibitory interconnections, temporal asymmetries originate from the specific mutual inhibition scheme considered. Extensive simulations supporting the model are reported.