Goto

Collaborating Authors

 Technology


Asymptotics of Gradient-based Neural Network Training Algorithms

Neural Information Processing Systems

We study the asymptotic properties of the sequence of iterates of weight-vector estimates obtained by training a multilayer feedforward neuralnetwork with a basic gradient-descent method using a fixed learning constant and no batch-processing. In the onedimensional case,an exact analysis establishes the existence of a limiting distribution that is not Gaussian in general. For the general caseand small learning constant, a linearization approximation permits the application of results from the theory of random matrices toagain establish the existence of a limiting distribution. We study the first few moments of this distribution to compare and contrast the results of our analysis with those of techniques of stochastic approximation. 1 INTRODUCTION The wide applicability of neural networks to problems in pattern classification and signal processing has been due to the development of efficient gradient-descent algorithms forthe supervised training of multilayer feedforward neural networks with differentiable node functions. A basic version uses a fixed learning constant and updates allweights after each training input is presented (online mode) rather than after the entire training set has been presented (batch mode). The properties of this algorithm as exhibited by the sequence of iterates are not yet well-understood. There are at present two major approaches.


A Comparison of Discrete-Time Operator Models for Nonlinear System Identification

Neural Information Processing Systems

We present a unifying view of discrete-time operator models used in the context of finite word length linear signal processing. Comparisons are made between the recently presented gamma operator model, and the delta and rho operator models for performing nonlinear system identification and prediction using neural networks. A new model based on an adaptive bilinear transformation which generalizes all of the above models is presented.


Learning Prototype Models for Tangent Distance

Neural Information Processing Systems

Local algorithms such as K-nearest neighbor (NN) perform well in pattern recognition, eventhough they often assume the simplest distance on the pattern space. It has recently been shown (Simard et al. 1993) that the performance can be further improved by incorporating invariance to specific transformations in the underlying distance metric - the so called tangent distance. The resulting classifier, however, canbe prohibitively slow and memory intensive due to the large amount of prototypes that need to be stored and used in the distance comparisons. In this paper we address this problem for the tangent distance algorithm, by developing richmodels for representing large subsets of the prototypes. Our leading example of prototype model is a low-dimensional (12) hyperplane defined by a point and a set of basis or tangent vectors.


A Computational Model of Prefrontal Cortex Function

Neural Information Processing Systems

Accumulating data from neurophysiology and neuropsychology have suggested two information processing roles for prefrontal cortex (PFC):1) short-term active memory; and 2) inhibition. We present a new behavioral task and a computational model which were developed in parallel. The task was developed to probe both of these prefrontal functions simultaneously, and produces a rich set of behavioral data that act as constraints on the model. The model is implemented in continuous-time, thus providing a natural framework in which to study the temporal dynamics of processing in the task. We show how the model can be used to examine the behavioral consequencesof neuromodulation in PFC. Specifically, we use the model to make novel and testable predictions regarding the behavioral performance of schizophrenics, who are hypothesized to suffer from reduced dopaminergic tone in this brain area.



Stochastic Dynamics of Three-State Neural Networks

Neural Information Processing Systems

We present here an analysis of the stochastic neurodynamics of a neural network composed of three-state neurons described by a master equation. An outer-product representation of the master equationis employed. In this representation, an extension of the analysis from two to three-state neurons is easily performed. We apply this formalism with approximation schemes to a simple three-statenetwork and compare the results with Monte Carlo simulations.


Capacity and Information Efficiency of a Brain-like Associative Net

Neural Information Processing Systems

Bruce Graham and David Willshaw Centre for Cognitive Science, University of Edinburgh 2 Buccleuch Place, Edinburgh, EH8 9LW, UK Email: bruce@cns.ed.ac.uk&david@cns.ed.ac.uk Abstract We have determined the capacity and information efficiency of an associative net configured in a brain-like way with partial connectivity andnoisy input cues. Recall theory was used to calculate the capacity when pattern recall is achieved using a winners-takeall strategy.Transforming the dendritic sum according to input activity and unit usage can greatly increase the capacity of the associative net under these conditions. This corresponds to the level of connectivity commonly seen in the brain and invites speculation that the brain is connected in the most information efficient way. 1 INTRODUCTION Standard network associative memories become more plausible as models of associative memoryin the brain if they incorporate (1) partial connectivity, (2) sparse activity and (3) recall from noisy cues. In this paper we consider the capacity of a binary associative net (Willshaw, Buneman, & Longuet-Higgins, 1969; Willshaw, 1971; Buckingham, 1991) containing these features. While the associative net is a very simple model of associative memory, its behaviour as a storage device is not trivial and yet it is tractable to theoretical analysis. We are able to calculate 514 BruceGraham, David Willshaw the capacity of the net in different configurations and with different pattern recall strategies.


On-line Learning of Dichotomies

Neural Information Processing Systems

The performance of online algorithms for learning dichotomies is studied. In online learning, thenumber of examples P is equivalent to the learning time, since each example is presented only once. The learning curve, or generalization error as a function of P, depends on the schedule at which the learning rate is lowered. For a target that is a perceptron rule, the learning curve of the perceptron algorithm can decrease as fast as p-1,if the schedule is optimized. If the target is not realizable by a perceptron, the perceptron algorithm does not generally converge to the solution with lowest generalization error.


New Algorithms for 2D and 3D Point Matching: Pose Estimation and Correspondence

Neural Information Processing Systems

A fundamental open problem in computer vision-determining pose and correspondence between two sets of points in spaceis solvedwith a novel, robust and easily implementable algorithm. The technique works on noisy point sets that may be of unequal sizes and may differ by nonrigid transformations. A 2D variation calculatesthe pose between point sets related by an affine transformation-translation, rotation, scale and shear. A 3D to 3D variation calculates translation and rotation. An objective describing theproblem is derived from Mean field theory. The objective is minimized with clocked (EMlike) dynamics. Experiments with both handwritten and synthetic data provide empirical evidence for the method. 1 Introduction


Unsupervised Classification of 3D Objects from 2D Views

Neural Information Processing Systems

Satoshi Suzuki Hiroshi Ando ATR Human Information Processing Research Laboratories 2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-02, Japan satoshi@hip.atr.co.jp, ando@hip.atr.co.jp Abstract This paper presents an unsupervised learning scheme for categorizing 3D objects from their 2D projected images. The scheme exploits an auto-associative network's ability to encode each view of a single object into a representation that indicates its view direction. We propose two models that employ different classification mechanisms; the first model selects an auto-associative network whose recovered view best matches the input view, and the second model is based on a modular architecture whose additional network classifies the views by splitting the input space nonlinearly. We demonstrate the effectiveness of the proposed classification models through simulations using 3D wire-frame objects. 1 INTRODUCTION The human visual system can recognize various 3D (three-dimensional) objects from their 2D (two-dimensional) retinal images although the images vary significantly as the viewpoint changes. Recent computational models have explored how to learn to recognize 3D objects from their projected views (Poggio & Edelman, 1990). Most existing models are, however, based on supervised learning, i.e., during training the teacher tells which object each view belongs to.