Asia
Connectionist Approaches to the Use of Markov Models for Speech Recognition
Bourlard, Hervé, Morgan, Nelson, Wooters, Chuck
Previous work has shown the ability of Multilayer Perceptrons (MLPs) to estimate emission probabilities for Hidden Markov Models (HMMs). The advantages of a speech recognition system incorporating both MLPs and HMMs are the best discrimination and the ability to incorporate multiple sources of evidence (features, temporal context) without restrictive assumptions of distributions or statistical independence. This paper presents results on the speaker-dependent portion of DARPA's English language Resource Management database. Results support the previously reported utility of MLP probability estimation for continuous speech recognition. An additional approach we are pursuing is to use MLPs as nonlinear predictors for autoregressive HMMs. While this is shown to be more compatible with the HMM formalism, it still suffers from several limitations. This approach is generalized to take account of time correlation between successive observations, without any restrictive assumptions about the driving noise. 1 INTRODUCTION We have been working on continuous speech recognition using moderately large vocabularies (1000 words) [1,2].
A Recurrent Neural Network for Word Identification from Continuous Phoneme Strings
Allen, Robert B., Kamm, Candace A.
A neural network architecture was designed for locating word boundaries and identifying words from phoneme sequences. This architecture was tested in three sets of studies. First, a highly redundant corpus with a restricted vocabulary was generated and the network was trained with a limited number of phonemic variations for the words in the corpus. Tests of network performance on a transfer set yielded a very low error rate. In a second study, a network was trained to identify words from expert transcriptions of speech.
A Theory for Neural Networks with Time Delays
Vries, Bert de, Príncipe, José Carlos
We present a new neural network model for processing of temporal patterns. This model, the gamma neural model, is as general as a convolution delay model with arbitrary weight kernels w(t). We show that the gamma model can be formulated as a (partially prewired) additive model. A temporal hebbian learning rule is derived and we establish links to related existing models for temporal processing. 1 INTRODUCTION In this paper, we are concerned with developing neural nets with short term memory for processing of temporal patterns. In the literature, basically two ways have been reported to incorporate short-term memory in the neural system equations.
Phase-coupling in Two-Dimensional Networks of Interacting Oscillators
Niebur, Ernst, Kammen, Daniel M., Koch, Christof, Ruderman, Daniel L., Schuster, Heinz G.
Coherent oscillatory activity in large networks of biological or artificial neural units may be a useful mechanism for coding information pertaining to a single perceptual object or for detailing regularities within a data set. We consider the dynamics of a large array of simple coupled oscillators under a variety of connection schemes. Of particular interest is the rapid and robust phase-locking that results from a "sparse" scheme where each oscillator is strongly coupled to a tiny, randomly selected, subset of its neighbors.
Adjoint-Functions and Temporal Learning Algorithms in Neural Networks
The development of learning algorithms is generally based upon the minimization of an energy function. It is a fundamental requirement to compute the gradient of this energy function with respect to the various parameters of the neural architecture, e.g., synaptic weights, neural gain,etc. In principle, this requires solving a system of nonlinear equations for each parameter of the model, which is computationally very expensive. A new methodology for neural learning of time-dependent nonlinear mappings is presented. It exploits the concept of adjoint operators to enable a fast global computation of the network's response to perturbations in all the systems parameters. The importance of the time boundary conditions of the adjoint functions is discussed. An algorithm is presented in which the adjoint sensitivity equations are solved simultaneously (Le., forward in time) along with the nonlinear dynamics of the neural networks. This methodology makes real-time applications and hardware implementation of temporal learning feasible.
Interaction Among Ocularity, Retinotopy and On-center/Off-center Pathways During Development
The development of projections from the retinas to the cortex is mathematically analyzed according to the previously proposed thermodynamic formulation of the self-organization of neural networks. Three types of submodality included in the visual afferent pathways are assumed in two models: model (A), in which the ocularity and retinotopy are considered separately, and model (B), in which on-center/off-center pathways are considered in addition to ocularity and retinotopy. Model (A) shows striped ocular dominance spatial patterns and, in ocular dominance histograms, reveals a dip in the binocular bin. Model (B) displays spatially modulated irregular patterns and shows single-peak behavior in the histograms. When we compare the simulated results with the observed results, it is evident that the ocular dominance spatial patterns and histograms for models (A) and (B) agree very closely with those seen in monkeys and cats.
Simulation of the Neocognitron on a CCD Parallel Processing Architecture
Chuang, Michael L., Chiang, Alice M.
The neocognitron is a neural network for pattern recognition and feature extraction. An analog CCD parallel processing architecture developed at Lincoln Laboratory is particularly well suited to the computational requirements of shared-weight networks such as the neocognitron, and implementation of the neocognitron using the CCD architecture was simulated. A modification to the neocognitron training procedure, which improves network performance under the limited arithmetic precision that would be imposed by the CCD architecture, is presented.
Second Order Properties of Error Surfaces: Learning Time and Generalization
LeCun, Yann, Kanter, Ido, Solla, Sara A.
The learning time of a simple neural network model is obtained through an analytic computation of the eigenvalue spectrum for the Hessian matrix, which describes the second order properties of the cost function in the space of coupling coefficients. The form of the eigenvalue distribution suggests new techniques for accelerating the learning process, and provides a theoretical justification for the choice of centered versus biased state variables.