Recurrent Networks: Second Order Properties and Pruning

Neural Information Processing Systems

Second order properties of cost functions for recurrent networks are investigated. We analyze a layered fully recurrent architecture, the virtue of this architecture is that it features the conventional feedforward architecture as a special case. A detailed description of recursive computation of the full Hessian of the network cost function isprovided. We discuss the possibility of invoking simplifying approximations of the Hessian and show how weight decays iron the cost function and thereby greatly assist training. We present tentative pruningresults, using Hassibi et al.'s Optimal Brain Surgeon, demonstrating that recurrent networks can construct an efficient internal memory. 1 LEARNING IN RECURRENT NETWORKS Time series processing is an important application area for neural networks and numerous architectures have been suggested, see e.g.


Recognizing Handwritten Digits Using Mixtures of Linear Models

Neural Information Processing Systems

We construct a mixture of locally linear generative models of a collection ofpixel-based images of digits, and use them for recognition. Different models of a given digit are used to capture different styles of writing, and new images are classified by evaluating their log-likelihoods under each model. We use an EMbased algorithm in which the M-step is computationally straightforward principal components analysis (PCA). Incorporating tangent-plane information [12]about expected local deformations only requires adding tangent vectors into the sample covariance matrices for the PCA, and it demonstrably improves performance.


Extracting Rules from Artificial Neural Networks with Distributed Representations

Neural Information Processing Systems

Although artificial neural networks have been applied in a variety of real-world scenarios with remarkable success, they have often been criticized for exhibiting a low degree of human comprehensibility. Techniques that compile compact sets of symbolic rules out of artificial neural networks offer a promising perspective to overcome this obvious deficiency of neural network representations. This paper presents an approach to the extraction of if-then rules from artificial neural networks.Its key mechanism is validity interval analysis, which is a generic tool for extracting symbolic knowledge by propagating rule-like knowledge through Backpropagation-style neural networks. Empirical studies in a robot arm domain illustrate theappropriateness of the proposed method for extracting rules from networks with real-valued and distributed representations.


Correlation and Interpolation Networks for Real-time Expression Analysis/Synthesis

Neural Information Processing Systems

We describe a framework for real-time tracking of facial expressions that uses neurally-inspired correlation and interpolation methods. A distributed view-based representation is used to characterize facial state, and is computed using a replicated correlation network. The ensemble response of the set of view correlation scores is input to a network based interpolation method, which maps perceptual state to motor control states for a simulated 3-D face model. Activation levels of the motor state correspond to muscle activations in an anatomically derived model. By integrating fast and robust 2-D processing with 3-D models, we obtain a system that is able to quickly track and interpret complex facial motions in real-time.


Dynamic Cell Structures

Neural Information Processing Systems

Dynamic Cell Structures (DCS) represent a family of artificial neural architectures suited both for unsupervised and supervised learning. They belong to the recently [Martinetz94] introduced class of Topology Representing Networks (TRN) which build perlectly topology preserving featuremaps. DCS empI'oy a modified Kohonen learning rule in conjunction with competitive Hebbian learning. The Kohonen type learning rule serves to adjust the synaptic weight vectors while Hebbian learning establishes a dynamic lateral connection structure between the units reflecting the topology of the feature manifold.


A solvable connectionist model of immediate recall of ordered lists

Neural Information Processing Systems

A model of short-term memory for serially ordered lists of verbal stimuli is proposed as an implementation of the'articulatory loop' thought to mediate this type of memory (Baddeley, 1986). The model predicts the presence of a repeatable time-varying'context' signal coding the timing of items' presentation in addition to a store of phonological information and a process of serial rehearsal. Items are associated with context nodes and phonemes by Hebbian connections showing both short and long term plasticity. Items are activated by phonemic input during presentation and reactivated by context and phonemic feedback during output. Serial selection of items occurs via a winner-take-all interaction amongst items, with the winner subsequently receiving decaying inhibition. An approximate analysis of error probabilities due to Gaussian noise during output is presented. The model provides an explanatory account of the probability of error as a function of serial position, list length, word length, phonemic similarity, temporal grouping, item and list familiarity, and is proposed as the starting point for a model of rehearsal and vocabulary acquisition.


Direction Selectivity In Primary Visual Cortex Using Massive Intracortical Connections

Neural Information Processing Systems

Almost all models of orientation and direction selectivity in visual cortex are based on feedforward connection schemes, where geniculate inputprovides all excitation to both pyramidal and inhibitory neurons. The latter neurons then suppress the response of the former fornon-optimal stimuli. However, anatomical studies show that up to 90 % of the excitatory synaptic input onto any cortical cellis provided by other cortical cells. The massive excitatory feedback nature of cortical circuits is embedded in the canonical microcircuit of Douglas &. Martin (1991). We here investigate analytically andthrough biologically realistic simulations the functioning of a detailed model of this circuitry, operating in a hysteretic mode. In the model, weak geniculate input is dramatically amplified byintracortical excitation, while inhibition has a dual role: (i) to prevent the early geniculate-induced excitation in the null direction and(ii) to restrain excitation and ensure that the neurons fire only when the stimulus is in their receptive-field.


Plasticity-Mediated Competitive Learning

Neural Information Processing Systems

Differentiation between the nodes of a competitive learning network isconventionally achieved through competition on the basis of neural activity. Simple inhibitory mechanisms are limited to sparse representations, while decorrelation and factorization schemes that support distributed representations are computationally unattractive.By letting neural plasticity mediate the competitive interactioninstead, we obtain diffuse, nonadaptive alternatives forfully distributed representations. We use this technique to Simplify and improve our binary information gain optimization algorithmfor feature extraction (Schraudolph and Sejnowski, 1993); the same approach could be used to improve other learning algorithms. 1 INTRODUCTION Unsupervised neural networks frequently employ sets of nodes or subnetworks with identical architecture and objective function. Some form of competitive interaction isthen needed for these nodes to differentiate and efficiently complement each other in their task.


Using Voice Transformations to Create Additional Training Talkers for Word Spotting

Neural Information Processing Systems

Lack of training data has always been a constraint in training speech recognizers. This research presentsa voice transformation technique which increases the variety among training talkers. The resulting more varied training set provided up to 2.9 percentage points of improvement in the figure of merit (average detection rate) of a high performance word spotter. This improvement is similar to the increase in performance provided by doubling the amount of training data (Carlson, 1994). This technique can also be applied to other speech recognition systems such as continuous speech recognition, talker identification, and isolated speech recognition.


Inferring Ground Truth from Subjective Labelling of Venus Images

Neural Information Processing Systems

In practical situations, experts may visually examine the images and provide a subjective noisy estimate of the truth. Calibrating the reliability and bias of expert labellers is a nontrivial problem. In this paper we discuss some of our recent work on this topic in the context of detecting small volcanoes in Magellan SAR images of Venus. Empirical results (using the Expectation-Maximization procedure) suggest that accounting for subjective noise can be quite significant interms of quantifying both human and algorithm detection performance.