Goto

Collaborating Authors

 Country



A competitive modular connectionist architecture

Neural Information Processing Systems

We describe a multi-network, or modular, connectionist architecture that captures that fact that many tasks have structure at a level of granularity intermediate to that assumed by local and global function approximation schemes. The main innovation of the architecture is that it combines associative and competitive learning in order to learn task decompositions. A task decomposition is discovered by forcing the networks comprising the architecture to compete to learn the training patterns. As a result of the competition, different networks learn different training patterns and, thus, learn to partition the input space. The performance of the architecture on a "what" and "where" vision task and on a multi-payload robotics task are presented.


On Stochastic Complexity and Admissible Models for Neural Network Classifiers

Neural Information Processing Systems

Padhraic Smyth Communications Systems Research Jet Propulsion Laboratory California Institute of Technology Pasadena, CA 91109 Abstract Given some training data how should we choose a particular network classifier froma family of networks of different complexities? In this paper we discuss how the application of stochastic complexity theory to classifier design problems can provide some insights into this problem. In particular we introduce the notion of admissible models whereby the complexity of models under consideration is affected by (among other factors) the class entropy, the amount of training data, and our prior belief. In particular we discuss the implications of these results with respect to neural architectures anddemonstrate the approach on real data from a medical diagnosis task. 1 Introduction and Motivation In this paper we examine in a general sense the application of Minimum Description Length (MDL) techniques to the problem of selecting a good classifier from a large set of candidate models or hypotheses. Pattern recognition algorithms differ from more conventional statistical modeling techniques in the sense that they typically choose from a very large number of candidate models to describe the available data.



Adjoint-Functions and Temporal Learning Algorithms in Neural Networks

Neural Information Processing Systems

The development of learning algorithms is generally based upon the minimization ofan energy function. It is a fundamental requirement to compute the gradient of this energy function with respect to the various parameters ofthe neural architecture, e.g., synaptic weights, neural gain,etc. In principle, this requires solving a system of nonlinear equations for each parameter of the model, which is computationally very expensive. A new methodology for neural learning of time-dependent nonlinear mappings is presented. It exploits the concept of adjoint operators to enable a fast global computation of the network's response to perturbations in all the systems parameters. The importance of the time boundary conditions of the adjoint functions is discussed. An algorithm is presented in which the adjoint sensitivity equations are solved simultaneously (Le., forward in time) along with the nonlinear dynamics of the neural networks. This methodology makes real-time applications and hardware implementation of temporal learning feasible.


A Multiscale Adaptive Network Model of Motion Computation in Primates

Neural Information Processing Systems

We demonstrate a multiscale adaptive network model of motion computation in primate area MT. The model consists of two stages: (l) local velocities are measured across multiple spatiotemporal channels, and (2) the optical flow field is computed by a network of directionselective neuronsat multiple spatial resolutions. This model embeds the computational efficiency of Multigrid algorithms within a parallel network as well as adaptively computes the most reliable estimate of the flow field across different spatial scales. Our model neurons show the same nonclassical receptive field properties as Allman's type I MT neurons. Since local velocities are measured across multiple channels, various channels often provide conflicting measurements to the network. We have incorporated a veto scheme for conflict resolution. This mechanism provides a novel explanation for the spatial frequency dependency of the psychophysical phenomenon called Motion Capture. 1 MOTIVATION We previously developed a two-stage model of motion computation in the visual system of primates (Le.



Analog Neural Networks as Decoders

Neural Information Processing Systems

In turn, KWTA networks can be used as decoders of a class of nonlinear error-correcting codes. By interconnecting suchKWTA networks, we can construct decoders capable of decoding more powerful codes. We consider several families of interconnected KWTAnetworks, analyze their performance in terms of coding theory metrics, and consider the feasibility of embedding such networks in VLSI technologies.


Shaping the State Space Landscape in Recurrent Networks

Neural Information Processing Systems

Bernard Victorri ELSAP Universite de Caen 14032 Caen Cedex France Fully recurrent (asymmetrical) networks can be thought of as dynamic systems. The dynamics can be shaped to perform content addressable memories, recognize sequences, or generate trajectories. Unfortunately several problems can arise: First, the convergence in the state space is not guaranteed. Second, the learned fixed points or trajectories are not necessarily stable. Finally, there might exist spurious fixed points and/or spurious "attracting" trajectories that do not correspond to any patterns.


Evaluation of Adaptive Mixtures of Competing Experts

Neural Information Processing Systems

We compare the performance of the modular architecture, composed of competing expert networks, suggested by Jacobs, Jordan, Nowlan and Hinton (1991) to the performance of a single back-propagation network on a complex, but low-dimensional, vowel recognition task. Simulations reveal that this system is capable of uncovering interesting decompositions in a complex task. The type of decomposition is strongly influenced by the nature of the input to the gating network that decides which expert to use for each case. The modular architecture also exhibits consistently better generalization on many variations of the task. 1 Introduction If back-propagation is used to train a single, multilayer network to perform different subtasks on different occasions, there will generally be strong interference effects which lead to slow learning and poor generalization. If we know in advance that a set of training cases may be naturally divideJ into subsets that correspond to distinct subtasks, interference can be reduced by using a system (see Figure 1) composed of several different "expert" networks plus a gating network that decides which of the experts should be used for each training case. Systems of this type have been suggested by a number of authors (Hampshire and Waibel, 1989; Jacobs, Jordan and Barto, 1990; Jacobs et al., 1991) (see also the paper by Jacobs and Jordan in this volume (1991».