Industry
Generalization by Weight-Elimination with Application to Forecasting
Weigend, Andreas S., Rumelhart, David E., Huberman, Bernardo A.
Bernardo A. Huberman Dynamics of Computation XeroxPARC Palo Alto, CA 94304 Inspired by the information theoretic idea of minimum description length, we add a term to the back propagation cost function that penalizes network complexity. We give the details of the procedure, called weight-elimination, describe its dynamics, and clarify the meaning of the parameters involved. From a Bayesian perspective, the complexity term can be usefully interpreted as an assumption about prior distribution of the weights. We use this procedure to predict the sunspot time series and the notoriously noisy series of currency exchange rates. 1 INTRODUCTION Learning procedures for connectionist networks are essentially statistical devices for performing inductiveinference. There is a tradeoff between two goals: on the one hand, we want such devices to be as general as possible so that they are able to learn a broad range of problems.
A Delay-Line Based Motion Detection Chip
Horiuchi, Tim, Lazzaro, John, Moore, Andrew, Koch, Christof
Inspired by a visual motion detection model for the ra.bbit retina and by a computational architecture used for early audition in the barn owl, we have designed a chip that employs a correlation model to report the one-dimensional field motion of a scene in real time. Using subthreshold analog VLSI techniques, we have fabricated and successfully tested a 8000 transistor chip using a standard MOSIS process.
A competitive modular connectionist architecture
Jacobs, Robert A., Jordan, Michael I.
We describe a multi-network, or modular, connectionist architecture that captures that fact that many tasks have structure at a level of granularity intermediate to that assumed by local and global function approximation schemes. The main innovation of the architecture is that it combines associative and competitive learning in order to learn task decompositions. A task decomposition is discovered by forcing the networks comprising the architecture to compete to learn the training patterns. As a result of the competition, different networks learn different training patterns and, thus, learn to partition the input space. The performance of the architecture on a "what" and "where" vision task and on a multi-payload robotics task are presented.
On Stochastic Complexity and Admissible Models for Neural Network Classifiers
Padhraic Smyth Communications Systems Research Jet Propulsion Laboratory California Institute of Technology Pasadena, CA 91109 Abstract Given some training data how should we choose a particular network classifier froma family of networks of different complexities? In this paper we discuss how the application of stochastic complexity theory to classifier design problems can provide some insights into this problem. In particular we introduce the notion of admissible models whereby the complexity of models under consideration is affected by (among other factors) the class entropy, the amount of training data, and our prior belief. In particular we discuss the implications of these results with respect to neural architectures anddemonstrate the approach on real data from a medical diagnosis task. 1 Introduction and Motivation In this paper we examine in a general sense the application of Minimum Description Length (MDL) techniques to the problem of selecting a good classifier from a large set of candidate models or hypotheses. Pattern recognition algorithms differ from more conventional statistical modeling techniques in the sense that they typically choose from a very large number of candidate models to describe the available data.
Adjoint-Functions and Temporal Learning Algorithms in Neural Networks
The development of learning algorithms is generally based upon the minimization ofan energy function. It is a fundamental requirement to compute the gradient of this energy function with respect to the various parameters ofthe neural architecture, e.g., synaptic weights, neural gain,etc. In principle, this requires solving a system of nonlinear equations for each parameter of the model, which is computationally very expensive. A new methodology for neural learning of time-dependent nonlinear mappings is presented. It exploits the concept of adjoint operators to enable a fast global computation of the network's response to perturbations in all the systems parameters. The importance of the time boundary conditions of the adjoint functions is discussed. An algorithm is presented in which the adjoint sensitivity equations are solved simultaneously (Le., forward in time) along with the nonlinear dynamics of the neural networks. This methodology makes real-time applications and hardware implementation of temporal learning feasible.
Analog Neural Networks as Decoders
Erlanson, Ruth, Abu-Mostafa, Yaser
In turn, KWTA networks can be used as decoders of a class of nonlinear error-correcting codes. By interconnecting suchKWTA networks, we can construct decoders capable of decoding more powerful codes. We consider several families of interconnected KWTAnetworks, analyze their performance in terms of coding theory metrics, and consider the feasibility of embedding such networks in VLSI technologies.
Exploratory Feature Extraction in Speech Signals
A novel unsupervised neural network for dimensionality reduction which seeks directions emphasizing multimodality is presented, and its connection toexploratory projection pursuit methods is discussed. This leads to a new statistical insight to the synaptic modification equations governing learning in Bienenstock, Cooper, and Munro (BCM) neurons (1982). The importance of a dimensionality reduction principle based solely on distinguishing features, is demonstrated using a linguistically motivated phoneme recognition experiment, and compared with feature extraction using back-propagation network. 1 Introduction Due to the curse of dimensionality (Bellman, 1961) it is desirable to extract features froma high dimensional data space before attempting a classification. How to perform this feature extraction/dimensionality reduction is not that clear. A first simplification is to consider only features defined by linear (or semi-linear) projections ofhigh dimensional data.
Flight Control in the Dragonfly: A Neurobiological Simulation
Faller, William E., Luttges, Marvin W.
Neural network simulations of the dragonfly flight neurocontrol system have been developed to understand how this insect uses complex, unsteady aerodynamics. The simulation networks account for the ganglionic spatial distribution of cells as well as the physiologic operating range and the stochastic cellular fIring history of each neuron. In addition the motor neuron firing patterns, "flight command sequences", were utilized. Simulation training was targeted against both the cellular and flight motor neuron firing patterns. The trained networks accurately resynthesized the intraganglionic cellular firing patterns. These in tum controlled the motor neuron fIring patterns that drive wing musculature during flight. Such networks provide both neurobiological analysis tools and fIrst generation controls for the use of "unsteady" aerodynamics.