Goto

Collaborating Authors

 United States


Stochastic Neurodynamics

Neural Information Processing Systems

The main point of this paper is that stochastic neural networks have a mathematical structure that corresponds quite closely with that of quantum field theory. Neural network Liouvillians and Lagrangians can be derived, just as can spin Hamiltonians and Lagrangians in QFf. It remains to show the efficacy of such a description.


Distributed Recursive Structure Processing

Neural Information Processing Systems

Harmonic grammar (Legendre, et al., 1990) is a connectionist theory of linguistic well-formed ness based on the assumption that the well-formedness of a sentence can be measured by the harmony (negative energy) of the corresponding connectionist state. Assuming a lower-level connectionist network that obeys a few general connectionist principles but is otherwise unspecified, we construct a higher-level network with an equivalent harmony function that captures the most linguistically relevant global aspects of the lower level network. In this paper, we extend the tensor product representation (Smolensky 1990) to fully recursive representations of recursively structured objects like sentences in the lower-level network. We show theoretically and with an example the power of the new technique for parallel distributed structure processing.


Time Trials on Second-Order and Variable-Learning-Rate Algorithms

Neural Information Processing Systems

The performance of seven minimization algorithms are compared on five neural network problems. These include a variable-step-size algorithm, conjugate gradient, and several methods with explicit analytic or numerical approximations to the Hessian.


Remarks on Interpolation and Recognition Using Neural Nets

Neural Information Processing Systems

We consider different types of single-hidden-Iayer feedforward nets: with or without direct input to output connections, and using either threshold or sigmoidal activation functions. The main results show that direct connections in threshold nets double the recognition but not the interpolation power, while using sigmoids rather than thresholds allows (at least) doubling both. Various results are also given on VC dimension and other measures of recognition capabilities.


Discovering Discrete Distributed Representations with Iterative Competitive Learning

Neural Information Processing Systems

Competitive learning is an unsupervised algorithm that classifies input patterns into mutually exclusive clusters. In a neural net framework, each cluster is represented by a processing unit that competes with others in a winnertake-all pool for an input pattern. I present a simple extension to the algorithm that allows it to construct discrete, distributed representations. Discrete representations are useful because they are relatively easy to analyze and their information content can readily be measured. Distributed representations are useful because they explicitly encode similarity. The basic idea is to apply competitive learning iteratively to an input pattern, and after each stage to subtract from the input pattern the component that was captured in the representation at that stage. This component is simply the weight vector of the winning unit of the competitive pool. The subtraction procedure forces competitive pools at different stages to encode different aspects of the input. The algorithm is essentially the same as a traditional data compression technique known as multistep vector quantization, although the neural net perspective suggests potentially powerful extensions to that approach.


Can neural networks do better than the Vapnik-Chervonenkis bounds?

Neural Information Processing Systems

These experiments are designed to test whether average generalization performance can surpass the worst-case bounds obtained from formal learning theory using the Vapnik-Chervonenkis dimension (Blumer et al., 1989). We indeed find that, in some cases, the average generalization is significantly better than the VC bound: the approach to perfect performance is exponential in the number of examples m, rather than the 11m result of the bound. In other cases, we do find the 11m behavior of the VC bound, and in these cases, the numerical prefactor is closely related to prefactor contained in the bound.


Remarks on Interpolation and Recognition Using Neural Nets

Neural Information Processing Systems

We consider different types of single-hidden-Iayer feedforward nets: with or without direct input to output connections, and using either threshold or sigmoidal activation functions. The main results show that direct connections in threshold nets double the recognition but not the interpolation power, while using sigmoids rather than thresholds allows (at least) doubling both. Various results are also given on VC dimension and other measures of recognition capabilities.


Connectionist Approaches to the Use of Markov Models for Speech Recognition

Neural Information Processing Systems

Previous work has shown the ability of Multilayer Perceptrons (MLPs) to estimate emission probabilities for Hidden Markov Models (HMMs). The advantages of a speech recognition system incorporating both MLPs and HMMs are the best discrimination and the ability to incorporate multiple sources of evidence (features, temporal context) without restrictive assumptions of distributions or statistical independence. This paper presents results on the speaker-dependent portion of DARPA's English language Resource Management database. Results support the previously reported utility of MLP probability estimation for continuous speech recognition. An additional approach we are pursuing is to use MLPs as nonlinear predictors for autoregressive HMMs. While this is shown to be more compatible with the HMM formalism, it still suffers from several limitations. This approach is generalized to take account of time correlation between successive observations, without any restrictive assumptions about the driving noise. 1 INTRODUCTION We have been working on continuous speech recognition using moderately large vocabularies (1000 words) [1,2].


Designing Linear Threshold Based Neural Network Pattern Classifiers

Neural Information Processing Systems

The three problems that concern us are identifying a natural domain of pattern classification applications of feed forward neural networks, selecting an appropriate feedforward network architecture, and assessing the tradeoff between network complexity, training set size, and statistical reliability as measured by the probability of incorrect classification. We close with some suggestions, for improving the bounds that come from Vapnik Chervonenkis theory, that can narrow, but not close, the chasm between theory and practice. Neural networks are appropriate as pattern classifiers when the pattern sources are ones of which we have little understanding, beyond perhaps a nonparametric statistical model, but we have been provided with classified samples of features drawn from each of the pattern categories. Neural networks should be able to provide rapid and reliable computation of complex decision functions. The issue in doubt is their statistical response to new inputs.


Neural Networks Structured for Control Application to Aircraft Landing

Neural Information Processing Systems

A recurrent back-propagation neural network architecture was then designed to numerically estimate the parameters of an optimal nonlinear control law for landing the aircraft. The performance of the network was then evaluated.