Goto

Collaborating Authors

 Europe



Maximum Likelihood Competitive Learning

Neural Information Processing Systems

One popular class of unsupervised algorithms are competitive algorithms. In the traditional view of competition, only one competitor, the winner, adapts for any given case. I propose to view competitive adaptation as attempting to fit a blend of simple probability generators (such as gaussians) to a set of data-points. The maximum likelihood fit of a model of this type suggests a "softer" form of competition, in which all competitors adapt in proportion to the relative probability that the input came from each competitor. I investigate one application of the soft competitive model, placement of radial basis function centers for function interpolation, and show that the soft model can give better performance with little additional computational cost. 1 INTRODUCTION Interest in unsupervised learning has increased recently due to the application of more sophisticated mathematical tools (Linsker, 1988; Plumbley and Fallside, 1988; Sanger, 1989) and the success of several elegant simulations of large scale selforganization (Linsker, 1986; Kohonen, 1982). One popular class of unsupervised algorithms are competitive algorithms, which have appeared as components in a variety of systems (Von der Malsburg, 1973; Fukushima, 1975; Grossberg, 1978). Generalizing the definition of Rumelhart and Zipser (1986), a competitive adaptive system consists of a collection of modules which are structurally identical except, possibly, for random initial parameter variation.


The "Moving Targets" Training Algorithm

Neural Information Processing Systems

A simple method for training the dynamical behavior of a neural network is derived. It is applicable to any training problem in discrete-time networks with arbitrary feedback. The algorithm resembles back-propagation in that an error function is minimized using a gradient-based method, but the optimization is carried out in the hidden part of state space either instead of, or in addition to weight space. Computational results are presented for some simple dynamical training problems, one of which requires response to a signal 100 time steps in the past. 1 INTRODUCTION This paper presents a minimization-based algorithm for training the dynamical behavior of a discrete-time neural network model. The central idea is to treat hidden nodes as target nodes with variable training data.


The CHIR Algorithm for Feed Forward Networks with Binary Weights

Neural Information Processing Systems

A new learning algorithm, Learning by Choice of Internal Represetations (CHIR), was recently introduced. Whereas many algorithms reduce the learning process to minimizing a cost function over the weights, our method treats the internal representations as the fundamental entities to be determined. The algorithm applies a search procedure in the space of internal representations, and a cooperative adaptation of the weights (e.g. by using the perceptron learning rule). Since the introduction of its basic, single output version, the CHIR algorithm was generalized to train any feed forward network of binary neurons. Here we present the generalised version of the CHIR algorithm, and further demonstrate its versatility by describing how it can be modified in order to train networks with binary ( 1) weights. Preliminary tests of this binary version on the random teacher problem are also reported.


Adjoint Operator Algorithms for Faster Learning in Dynamical Neural Networks

Neural Information Processing Systems

A methodology for faster supervised learning in dynamical nonlinear neural networks is presented. It exploits the concept of adjoint operntors to enable computation of changes in the network's response due to perturbations in all system parameters, using the solution of a single set of appropriately constructed linear equations. The lower bound on speedup per learning iteration over conventional methods for calculating the neuromorphic energy gradient is O(N2), where N is the number of neurons in the network. 1 INTRODUCTION The biggest promise of artifcial neural networks as computational tools lies in the hope that they will enable fast processing and synthesis of complex information patterns. In particular, considerable efforts have recently been devoted to the formulation of efficent methodologies for learning (e.g., Rumelhart et al., 1986; Pineda, 1988; Pearlmutter, 1989; Williams and Zipser, 1989; Barhen, Gulati and Zak, 1989). The development of learning algorithms is generally based upon the minimization of a neuromorphic energy function. The fundamental requirement of such an approach is the computation of the gradient of this objective function with respect to the various parameters of the neural architecture, e.g., synaptic weights, neural Adjoint Operator Algorithms 499


Learning in Higher-Order "Artificial Dendritic Trees

Neural Information Processing Systems

The computational territory between the linearly summing McCulloch-Pitts neuron and the nonlinear differential equations of Hodgkin & Huxley is relatively sparsely populated. Connectionists use variants of the former and computational neuroscientists struggle with the exploding parameter spaces provided by the latter. However, evidence from biophysical simulations suggests that the voltage transfer properties of synapses, spines and dendritic membranes involve many detailed nonlinear interactions, not just a squashing function at the cell body. Real neurons may indeed be higher-order nets. For the computationally-minded, higher order interactions means, first of all, quadratic terms. This contribution presents a simple learning principle for a binary tree with a logistic/quadratic transfer function at each node. These functions, though highly nested, are shown to be capable of changing their shape in concert. The resulting tree structure receives inputs at its leaves, and outputs an estimate of the probability that the input pattern is a member of one of two classes at the top.



Predicting Weather Using a Genetic Memory: A Combination of Kanerva's Sparse Distributed Memory with Holland's Genetic Algorithms

Neural Information Processing Systems

Kanerva's sparse distributed memory (SDM) is an associative-memory model based on the mathematical properties of high-dimensional binary address spaces. Holland's genetic algorithms are a search technique for high-dimensional spaces inspired by evolutionary processes of DNA. "Genetic Memory" is a hybrid of the above two systems, in which the memory uses a genetic algorithm to dynamically reconfigure its physical storage locations to reflect correlations between the stored addresses and data. For example, when presented with raw weather station data, the Genetic Memory discovers specific features in the weather data which correlate well with upcoming rain, and reconfigures the memory to utilize this information effectively. This architecture is designed to maximize the ability of the system to scale-up to handle real-world problems.


Handwritten Digit Recognition with a Back-Propagation Network

Neural Information Processing Systems

We present an application of back-propagation networks to handwritten digit recognition. Minimal preprocessing of the data was required, but architecture of the network was highly constrained and specifically designed for the task. The input of the network consists of normalized images of isolated digits. The method has 1 % error rate and about a 9% reject rate on zipcode digits provided by the U.S. Postal Service. 1 INTRODUCTION The main point of this paper is to show that large back-propagation (BP) networks can be applied to real image-recognition problems without a large, complex preprocessing stage requiring detailed engineering. Unlike most previous work on the subject (Denker et al., 1989), the learning network is directly fed with images, rather than feature vectors, thus demonstrating the ability of BP networks to deal with large amounts of low level information. Previous work performed on simple digit images (Le Cun, 1989) showed that the architecture of the network strongly influences the network's generalization ability. Good generalization can only be obtained by designing a network architecture that contains a certain amount of a priori knowledge about the problem. The basic design principle is to minimize the number of free parameters that must be determined by the learning algorithm, without overly reducing the computational power of the network.


A Self-organizing Associative Memory System for Control Applications

Neural Information Processing Systems

ABSTRACT The CHAC storage scheme has been used as a basis for a software implementation of an associative .emory A major disadvantage of this CHAC-concept is that the degree of local generalization (area of interpolation) is fixed. This paper deals with an algorithm for self-organizing variable generalization for the AKS, based on ideas of T. Kohonen. 1 INTRODUCTION For several years research at the Department of Control Theory and Robotics at the Technical University of Darmstadt has been concerned with the design of a learning real-time control loop with neuron-like associative memories (LERNAS) A Self-organizing Associative Memory System for Control Applications 333 for the control of unknown, nonlinear processes (Ersue, Tolle, 1988). This control concept uses an associative memory system AHS, based on the cerebellar cortex model CHAC by Albus (Albus, 1972), for the storage of a predictive nonlinear process model and an appropriate nonlinear control strategy (Figure 1). Figure 1: The learning control loop LERNAS One problem for adjusting the control loop to a process is, however, to find a suitable set of parameters for the associative memory. The parameters in question determine the degree of generalization within the memory and therefore have a direct influence on the number of training steps required to learn the process behaviour. For a good performance of the control loop it· is desirable to have a very small generalization around a given setpoint but to have a large generalization elsewhere.