Goto

Collaborating Authors

 Country


Neural Network - Gaussian Mixture Hybrid for Speech Recognition or Density Estimation

Neural Information Processing Systems

The subject of this paper is the integration of multi-layered Artificial Neural Networks(ANN) with probability density functions such as Gaussian mixtures found in continuous density Hidden Markov Models (HMM). In the first part of this paper we present an ANN/HMM hybrid in which all the parameters of the the system are simultaneously optimized with respect to a single criterion. In the second part of this paper, we study the relationship between the density of the inputs of the network and the density of the outputs of the networks. A few experiments are presented to explore how to perform density estimation with ANNs. 1 INTRODUCTION This paper studies the integration of Artificial Neural Networks (ANN) with probability densityfunctions (pdf) such as the Gaussian mixtures often used in continuous density Hidden Markov Models. The ANNs considered here are multi-layered or recurrent networks with hyperbolic tangent hidden units.


Information Processing to Create Eye Movements

Neural Information Processing Systems

Because eye muscles never cocontract and do not deal with external loads, one can write an equation that relates motoneuron firing rate to eye position and velocity - a very uncommon situation in the CNS. The semicircular canals transduce head velocity in a linear manner by using a high background discharge rate, imparting linearity to the premotor circuits that generate eye movements. This has allowed deducing some of the signal processing involved, including a neural network that integrates. These ideas are often summarized by block diagrams. Unfortunately, they are of little value in describing the behavior of single neurons - a fmding supported by neural network models.


Generalization Performance in PARSEC - A Structured Connectionist Parsing Architecture

Neural Information Processing Systems

This paper presents PARSECa system for generating connectionist parsing networks from example parses. PARSEC is not based on formal grammar systems and is geared toward spoken language tasks. PARSEC networks exhibit three strengths important for application to speech processing: 1)they learn to parse, and generalize well compared to handcoded grammars; 2) they tolerate several types of noise; 3) they can learn to use multi-modal input. Presented are the PARSEC architecture and performance analyses along several dimensions that demonstrate PARSEC's features. PARSEC's performance is compared to that of traditional grammar-basedparsing systems. 1 INTRODUCTION While a great deal of research has been done developing parsers for natural language, adequate solutionsfor some of the particular problems involved in spoken language have not been found. Among the unsolved problems are the difficulty in constructing task-specific grammars, lack of tolerance to noisy input, and inability to effectively utilize non-symbolic information.This paper describes PARSECa system for generating connectionist parsing networks from example parses.


Combined Neural Network and Rule-Based Framework for Probabilistic Pattern Recognition and Discovery

Neural Information Processing Systems

A combined neural network and rule-based approach is suggested as a general framework for pattern recognition. This approach enables unsupervised andsupervised learning, respectively, while providing probability estimates for the output classes. The probability maps are utilized for higher level analysis such as a feedback for smoothing over the output label mapsand the identification of unknown patterns (pattern "discovery"). The suggested approach is presented and demonstrated in the texture - analysis task. A correct classification rate in the 90 percentile is achieved for both unstructured and structured natural texture mosaics. The advantages ofthe probabilistic approach to pattern analysis are demonstrated.


Shooting Craps in Search of an Optimal Strategy for Training Connectionist Pattern Classifiers

Neural Information Processing Systems

We compare two strategies for training connectionist (as well as nonconnectionist) modelsfor statistical pattern recognition. The probabilistic strategy is based on the notion that Bayesian discrimination (i.e.- optimal classification) isachieved when the classifier learns the a posteriori class distributions of the random feature vector. The differential strategy is based on the notion that the identity of the largest class a posteriori probability of the feature vector is all that is needed to achieve Bayesian discrimination. Each strategy is directly linked to a family ofobjective functions that can be used in the supervised training procedure. We prove that the probabilistic strategy - linked with error measure objective functions such as mean-squared-error and cross-entropy - typically used to train classifiers necessarily requires larger training sets and more complex classifier architectures than those needed to approximate the Bayesian discriminant function.In contrast.


Tangent Prop - A formalism for specifying selected invariances in an adaptive network

Neural Information Processing Systems

In many machine learning applications, one has access, not only to training data, but also to some high-level a priori knowledge about the desired behavior ofthe system. For example, it is known in advance that the output of a character recognizer should be invariant with respect to small spatial distortionsof the input images (translations, rotations, scale changes, etcetera). We have implemented a scheme that allows a network to learn the derivative ofits outputs with respect to distortion operators of our choosing. This not only reduces the learning time and the amount of training data, but also provides a powerful language for specifying what generalizations we wish the network to perform. 1 INTRODUCTION In machine learning, one very often knows more about the function to be learned than just the training data. An interesting case is when certain directional derivatives ofthe desired function are known at certain points.


A Neural Net Model for Adaptive Control of Saccadic Accuracy by Primate Cerebellum and Brainstem

Neural Information Processing Systems

Accurate saccades require interaction between brainstem circuitry and the cerebeJJum. A model of this interaction is described, based on Kawato's principle of feedback-error-Iearning. In the model a part of the brainstem (the superior colliculus) acts as a simple feedback controJJer with no knowledge of initial eye position, and provides an error signal for the cerebeJJum to correct for eye-muscle nonIinearities. This teaches the cerebeJJum, modelled as a CMAC, to adjust appropriately the gain on the brainstem burst-generator's internal feedback loop and so alter the size of burst sent to the motoneurons. With direction-only errors the system rapidly learns to make accurate horizontal eye movements from any starting position, and adapts realistically to subsequent simulated eye-muscle weakening or displacement of the saccadic target.


A Topographic Product for the Optimization of Self-Organizing Feature Maps

Neural Information Processing Systems

We present a topographic product which measures the preservation of neighborhood relations as a criterion to optimize the output space topology of the map with regard to the global dimensionality DA as well as to the dimensions inthe individual directions. We test the topographic product method not only on synthetic mapping examples, but also on speech data.


Markov Random Fields Can Bridge Levels of Abstraction

Neural Information Processing Systems

Network vision systems must make inferences from evidential information acrosslevels of representational abstraction, from low level invariants, through intermediate scene segments, to high level behaviorally relevant object descriptions. This paper shows that such networks can be realized as Markov Random Fields (MRFs). We show first how to construct an MRF functionally equivalent to a Hough transform parameter network, thus establishing a principled probabilistic basis for visual networks. Second, weshow that these MRF parameter networks are more capable and flexible than traditional methods. In particular, they have a well-defined probabilistic interpretation, intrinsically incorporate feedback, and offer richer representations and decision capabilities.


The Clusteron: Toward a Simple Abstraction for a Complex Neuron

Neural Information Processing Systems

The nature of information processing in complex dendritic trees has remained an open question since the origin of the neuron doctrine 100 years ago. With respect to learning, for example, it is not known whether a neuron is best modeled as 35 36 Mel a pseudo-linear unit, equivalent in power to a simple Perceptron, or as a general nonlinear learning device, equivalent in power to a multi-layered network. In an attempt tocharacterize the input-output behavior of a whole dendritic tree containing voltage-dependent membrane mechanisms, a recent compartmental modeling study in an anatomically reconstructed neocortical pyramidal cell (anatomical data from Douglas et al., 1991; "NEURON" simulation package provided by Michael Hines and John Moore) showed that a dendritic tree rich in NMDA-type synaptic channels isselectively responsive to spatially clustered, as opposed to diffuse, pattens of synaptic activation (Mel, 1992). For example, 100 synapses which were simultaneously activatedat 100 randomly chosen locations about the dendritic arbor were less effective at firing the cell than 100 synapses activated in groups of 5, at each of 20 randomly chosen dendritic locations. The cooperativity among the synapses in each group is due to the voltage dependence of the NMDA channel: Each activated NMDA synapse becomes up to three times more effective at injecting synaptic current whenthe post-synaptic membrane is locally depolarized by 30-40 m V from the resting potential.