Technology
Dynamics of On-Line Gradient Descent Learning for Multilayer Neural Networks
Sollat CONNECT, The Niels Bohr Institute Blegdamsdvej 17 Copenhagen 2100, Denmark Abstract We consider the problem of online gradient descent learning for general two-layer neural networks. An analytic solution is presented andused to investigate the role of the learning rate in controlling theevolution and convergence of the learning process. Two-layer networks with an arbitrary number of hidden units have been shown to be universal approximators [1] for such N-to-one dimensional maps. We investigate the emergence of generalization ability in an online learning scenario [2], in which the couplings are modified after the presentation of each example so as to minimize the corresponding error. The resulting changes in {J} are described as a dynamical evolution; the number of examples plays the role of time.
Sample Complexity for Learning Recurrent Perceptron Mappings
DasGupta, Bhaskar, Sontag, Eduardo D.
Recurrent perceptron classifiers generalize the classical perceptron model. They take into account those correlations and dependences among input coordinates which arise from linear digital filtering. This paper provides tight bounds on sample complexity associated to the fitting of such models to experimental data. 1 Introduction One of the most popular approaches to binary pattern classification, underlying many statistical techniques, is based on perceptrons or linear discriminants; see for instance the classical reference (Duda and Hart, 1973).
Worst-case Loss Bounds for Single Neurons
Helmbold, David P., Kivinen, Jyrki, Warmuth, Manfred K.
We analyze and compare the well-known Gradient Descent algorithm anda new algorithm, called the Exponentiated Gradient algorithm, for training a single neuron with an arbitrary transfer function . Both algorithms are easily generalized to larger neural networks, and the generalization of Gradient Descent is the standard back-propagationalgorithm. In this paper we prove worstcase loss bounds for both algorithms in the single neuron case. Since local minima make it difficult to prove worst-case bounds for gradient-based algorithms, we must use a loss function that prevents the formation of spurious local minima. We define such a matching loss function for any strictly increasing differentiable transfer function and prove worst-case loss bound for any such transfer function and its corresponding matching loss. For example, thematching loss for the identity function is the square loss and the matching loss for the logistic sigmoid is the entropic loss. The different structure of the bounds for the two algorithms indicates thatthe new algorithm outperforms Gradient Descent when the inputs contain a large number of irrelevant components.
Modeling Saccadic Targeting in Visual Search
Rao, Rajesh P. N., Zelinsky, Gregory J., Hayhoe, Mary M., Ballard, Dana H.
Visual cognition depends criticalIy on the ability to make rapid eye movements known as saccades that orient the fovea over targets of interest in a visual scene. Saccades are known to be ballistic: the pattern of muscle activation for foveating a prespecified target location is computed prior to the movement and visual feedback is precluded. Despite these distinctive properties, there has been no general model of the saccadic targeting strategy employed by the human visual system during visual search in natural scenes. This paper proposes a model for saccadic targeting that uses iconic scene representations derived from oriented spatial filters at multiple scales. Visual search proceeds in a coarse-to-fine fashion with the largest scale filter responses being compared first. The model was empirically tested by comparing its perfonnance with actual eye movement data from human subjects in a natural visual search task; preliminary results indicate substantial agreement between eye movements predicted by the model and those recorded from human subjects.
Information through a Spiking Neuron
Stevens, Charles F., Zador, Anthony M.
While it is generally agreed that neurons transmit information about their synaptic inputs through spike trains, the code by which this information is transmitted is not well understood. An upper bound on the information encoded is obtained by hypothesizing that the precise timing of each spike conveys information. Here we develop a general approach to quantifying the information carried by spike trains under this hypothesis, and apply it to the leaky integrate-and-fire (IF) model of neuronal dynamics. We formulate the problem in terms of the probability distribution peT) of interspike intervals (ISIs), assuming that spikes are detected with arbitrary but finite temporal resolution. In the absence of added noise, all the variability in the ISIs could encode information, and the information rate is simply the entropy of the lSI distribution, H (T) (-p(T) log2 p(T)}, times the spike rate. H (T) thus provides an exact expression for the information rate. The methods developed here can be used to determine experimentally the information carried by spike trains, even when the lower bound of the information rate provided by the stimulus reconstruction method is not tight. In a preliminary series of experiments, we have used these methods to estimate information rates of hippocampal neurons in slice in response to somatic current injection. These pilot experiments suggest information rates as high as 6.3 bits/spike.
Generalisation of A Class of Continuous Neural Networks
Shawe-Taylor, John, Zhao, Jieyu
More recently attempts have been made to introduce some computational cost related to the accuracy of the computations [5]. The model proposed in this paper weakens the computational power still further by relying on classical boolean circuits to perform the computation using a simple encoding of the real values. Using this encoding we also show that Teo circuits interpreted in the model correspond to a Neural Network design referred to as Bit Stream Neural Networks, which have been developed for hardware implementation [8]. With the perspective afforded by the general approach considered here, we are also able to analyse the Bit Stream Neural Networks (or indeed any other adaptive system based on the technique), giving VC dimension and sample size bounds for PAC learning.
Neural Networks with Quadratic VC Dimension
Koiran, Pascal, Sontag, Eduardo D.
A set of labeled training samples is provided, and a network must be obtained which is then expected to correctly classify previously unseen inputs. In this context, a central problem is to estimate the amount of training data needed to guarantee satisfactory learning performance. To study this question, it is necessary to first formalize the notion of learning from examples. One such formalization is based on the paradigm of probably approximately correct (PAC) learning, due to Valiant (1984). In this framework, one starts by fitting some function /, chosen from a predetermined class F, to the given training data. The class F is often called the "hypothesis class", and for purposes of this discussion it will be assumed that the functions in F take binary values {O, I} and are defined on a common domain X.
Rapid Quality Estimation of Neural Network Input Representations
Cherkauer, Kevin J., Shavlik, Jude W.
However, ANNs are usually costly to train, preventing one from trying many different representations. In this paper, we address this problem by introducing and evaluating three new measures for quickly estimating ANN input representation quality. Two of these, called [DBleaves and Min (leaves), consistently outperform Rendell and Ragavan's (1993) blurring measure in accurately ranking different input representations for ANN learning on three difficult, real-world datasets.
Implementation Issues in the Fourier Transform Algorithm
Tel-Aviv University Tel-Aviv, ISRAEL Abstract The Fourier transform of boolean functions has come to play an important role in proving many important learnability results. We aim to demonstrate that the Fourier transform techniques are also a useful and practical algorithm in addition to being a powerful theoretical tool. We describe the more prominent changes we have introduced to the algorithm, ones that were crucial and without which the performance of the algorithm would severely deteriorate. Oneof the benefits we present is the confidence level for each prediction which measures the likelihood the prediction is correct. 1 INTRODUCTION It has been used mainly to demonstrate the learnability of various classes of functions with respect to the uniform distribution. The work of [5] developed a very powerful algorithmic procedure: given a function and a threshold parameter it finds in polynomial time all the Fourier coefficients ofthe function larger than the threshold.
A Model of Spatial Representations in Parietal Cortex Explains Hemineglect
Pouget, Alexandre, Sejnowski, Terrence J.
We have recently developed a theory of spatial representations in which the position of an object is not encoded in a particular frame of reference but, instead, involves neurons computing basis functions oftheir sensory inputs. This type of representation is able to perform nonlinear sensorimotor transformations and is consistent withthe response properties of parietal neurons. We now ask whether the same theory could account for the behavior of human patients with parietal lesions. These lesions induce a deficit known as hemineglect that is characterized by a lack of reaction to stimuli located in the hemispace contralateral to the lesion. A simulated lesion in a basis function representation was found to replicate three of the most important aspects of hemineglect: i) The models failed to cross the leftmost lines in line cancellation experiments, ii) the deficit affected multiple frames of reference and, iii) it could be object centered. These results strongly support the basis function hypothesis for spatial representations and provide a computational theory of hemineglect at the single cell level. 1 Introduction According to current theories of spatial representations, the positions of objects are represented in multiple modules throughout the brain, each module being specialized fora particular sensorimotor transformation and using its own frame of reference. For instance, the lateral intraparietal area (LIP) appears to encode the location of objects in oculocentric coordinates, presumably for the control of saccadic eyemovements.