Not enough data to create a plot.
Try a different view from the menu above.
Second Order Properties of Error Surfaces: Learning Time and Generalization
LeCun, Yann, Kanter, Ido, Solla, Sara A.
Holmdel, NJ 07733, USA The learning time of a simple neural network model is obtained through an analytic computation of the eigenvalue spectrum for the Hessian matrix, which describes the second order properties of the cost function in the space of coupling coefficients. The form of the eigenvalue distribution suggests new techniques for accelerating the learning process, and provides a theoretical justification for the choice of centered versus biased state variables.
A Theory for Neural Networks with Time Delays
Vries, Bert de, Prรญncipe, Josรฉ Carlos
We present a new neural network model for processing of temporal patterns. This model, the gamma neural model, is as general as a convolution delay model with arbitrary weight kernels w(t). We show that the gamma model can be formulated as a (partially prewired) additive model. A temporal hebbian learning rule is derived and we establish links to related existing models for temporal processing. 1 INTRODUCTION In this paper, we are concerned with developing neural nets with short term memory for processing of temporal patterns. In the literature, basically two ways have been reported to incorporate short-term memory in the neural system equations.
Multi-Layer Perceptrons with B-Spline Receptive Field Functions
Lane, Stephen H., Flax, Marshall, Handelman, David, Gelfand, Jack
Multi-layer perceptrons are often slow to learn nonlinear functions with complex local structure due to the global nature of their function approximations. It is shown that standard multi-layer perceptrons are actually a special case of a more general network formulation that incorporates B-splines into the node computations. This allows novel spline network architectures to be developed that can combine the generalization capabilities and scaling properties of global multi-layer feedforward networks with the computational efficiency and learning speed of local computational paradigms. Simulation results are presented for the well known spiral problem of Weiland and of Lang and Witbrock to show the effectiveness of the Spline Net approach.
Learning to See Rotation and Dilation with a Hebb Rule
Sereno, Martin I., Sereno, Margaret E.
Sereno, 1987) showed that a feedforward network with area VIlike input-layer units and a Hebb rule can develop area MTlike second layer units that solve the aperture problem for pattern motion. The present study extends this earlier work to more complex motions. Saito et al. (1986) showed that neurons with large receptive fields in macaque visual area MST are sensitive to different senses of rotation and dilation, irrespective of the receptive field location of the movement singularity. A network with an MTlike second layer was trained and tested on combinations of rotating, dilating, and translating patterns. Third-layer units learn to detect specific senses of rotation or dilation in a position-independent fashion, despite having position-dependent direction selectivity within their receptive fields.
Simulation of the Neocognitron on a CCD Parallel Processing Architecture
Chuang, Michael L., Chiang, Alice M.
The neocognitron is a neural network for pattern recognition and feature extraction. An analog CCD parallel processing architecture developed at Lincoln Laboratory is particularly well suited to the computational requirements ofshared-weight networks such as the neocognitron, and implementation of the neocognitron using the CCD architecture was simulated. A modification to the neocognitron training procedure, which improves network performance under the limited arithmetic precision that would be imposed by the CCD architecture, is presented.
Evolution and Learning in Neural Networks: The Number and Distribution of Learning Trials Affect the Rate of Evolution
Learning can increase the rate of evolution of a population of biological organisms (the Baldwin effect). Our simulations show that in a population of artificial neural networks solving a pattern recognition problem, no learning or too much learning leads to slow evolution of the genes whereas an intermediate amount is optimal. Moreover, for a given total number of training presentations, fastest evoution occurs if different individuals within each generation receive different numbers of presentations, rather than equal numbers. Because genetic algorithms (GAs) help avoid local minima in energy functions, our hybrid learning-GA systems can be applied successfully to complex, highdimensional patternrecognition problems. INTRODUCTION The structure and function of a biological network derives from both its evolutionary precursors and real-time learning.
Transforming Neural-Net Output Levels to Probability Distributions
John S. Denker and Yann leCun AT&T Bell Laboratories Holmdel, NJ 07733 Abstract (1) The outputs of a typical multi-output classification network do not satisfy the axioms of probability; probabilities should be positive and sum to one. This problem can be solved by treating the trained network as a preprocessor that produces a feature vector that can be further processed, for instance by classical statistical estimation techniques. It is particularly useful to combine these two ideas: we implement the ideas of section 1 using Parzen windows, where the shape and relative size of each window is computed using the ideas of section 2. This allows us to make contact between important theoretical ideas (e.g. the ensemble formalism) and practical techniques (e.g. Our results also shed new light on and generalize the well-known "softmax" scheme. 1 Distribution of Categories in Output Space In many neural-net applications, it is crucial to produce a set of C numbers that serve as estimates of the probability of C mutually exclusive outcomes. For example, inspeech recognition, these numbers represent the probability of C different phonemes; the probabilities of successive segments can be combined using a Hidden Markov Model.
A Short-Term Memory Architecture for the Learning of Morphophonemic Rules
In the debate over the power of connectionist models to handle linguistic phenomena, considerableattention has been focused on the learning of simple morphological rules. It is a straightforward matter in a symbolic system to specify how the meanings ofa stem and a bound morpheme combine to yield the meaning of a whole word and how the form of the bound morpheme depends on the shape of the stem. In a distributed connectionist system, however, where there may be no explicit morphemes, words, or rules, things are not so simple. The most important work in this area has been that of Rumelhart and McClelland (1986), together with later extensions by Marchman and Plunkett (1989). The networks involvedwere trained to associate English verb stems with the corresponding past-tense forms, successfully generating both regular and irregular forms and generalizing tonovel inputs. This work established that rule-like linguistic behavior 605 606 Gasser and Lee could be achieved in a system with no explicit rules. However, it did have important limitations, among them the following: 1. The representation of linguistic form was inadequate. This is clear, for example, fromthe fact that distinct lexical items may be associated with identical representations (Pinker & Prince, 1988).
Navigating through Temporal Difference
Barto, Sutton and Watkins [2] introduced a grid task as a didactic example oftemporal difference planning and asynchronous dynamical pre gramming. Thispaper considers the effects of changing the coding of the input stimulus, and demonstrates that the self-supervised learning of a particular form of hidden unit representation improves performance.