Goto

Collaborating Authors

 Country


Speech Recognition Using Demi-Syllable Neural Prediction Model

Neural Information Processing Systems

The Neural Prediction Model is the speech recognition model based on pattern prediction by multilayer perceptrons. Its effectiveness was confirmed by the speaker-independent digit recognition experiments. This paper presents an improvement in the model and its application to large vocabulary speech recognition, based on subword units. The improvement involves an introduction of "backward prediction," which further improves the prediction accuracy of the original model with only "forward prediction". In application of the model to speaker-dependent large vocabulary speech recognition, the demi-syllable unit is used as a subword recognition unit.


Spoken Letter Recognition

Neural Information Processing Systems

Through the use of neural network classifiers and careful feature selection, we have achieved high-accuracy speaker-independent spoken letter recognition. For isolated letters, a broad-category segmentation is performed Location of segment boundaries allows us to measure features at specific locations in the signal such as vowel onset, where important information resides. Letter classification is performed with a feed-forward neural network. Recognition accuracy on a test set of 30 speakers was 96%. Neural network classifiers are also used for pitch tracking and broad-category segmentation of letter strings.


Connectionist Approaches to the Use of Markov Models for Speech Recognition

Neural Information Processing Systems

Previous work has shown the ability of Multilayer Perceptrons (MLPs) to estimate emission probabilities for Hidden Markov Models (HMMs). The advantages of a speech recognition system incorporating both MLPs and HMMs are the best discrimination and the ability to incorporate multiple sources of evidence (features, temporal context) without restrictive assumptions of distributions or statistical independence. This paper presents results on the speaker-dependent portion of DARPA's English language Resource Management database. Results support the previously reported utility of MLP probability estimation for continuous speech recognition. An additional approach we are pursuing is to use MLPs as nonlinear predictors for autoregressive HMMs. While this is shown to be more compatible with the HMM formalism, it still suffers from several limitations. This approach is generalized to take account of time correlation between successive observations, without any restrictive assumptions about the driving noise. 1 INTRODUCTION We have been working on continuous speech recognition using moderately large vocabularies (1000 words) [1,2].


A Recurrent Neural Network for Word Identification from Continuous Phoneme Strings

Neural Information Processing Systems

A neural network architecture was designed for locating word boundaries and identifying words from phoneme sequences. This architecture was tested in three sets of studies. First, a highly redundant corpus with a restricted vocabulary was generated and the network was trained with a limited number of phonemic variations for the words in the corpus. Tests of network performance on a transfer set yielded a very low error rate. In a second study, a network was trained to identify words from expert transcriptions of speech.


Continuous Speech Recognition by Linked Predictive Neural Networks

Neural Information Processing Systems

We present a large vocabulary, continuous speech recognition system based on Linked Predictive Neural Networks (LPNN's). The system uses neural networks as predictors of speech frames, yielding distortion measures which are used by the One Stage DTW algorithm to perform continuous speech recognition. The system, already deployed in a Speech to Speech Translation system, currently achieves 95%, 58%, and 39% word accuracy on tasks with perplexity 5, 111, and 402 respectively, outperforming several simple HMMs that we tested. We also found that the accuracy and speed of the LPNN can be slightly improved by the judicious use of hidden control inputs. We conclude by discussing the strengths and weaknesses of the predictive approach.


The Recurrent Cascade-Correlation Architecture

Neural Information Processing Systems

Recurrent Cascade-Correlation CRCC) is a recurrent version of the Cascade Correlation learning architecture of Fah I man and Lebiere [Fahlman, 1990]. RCC can learn from examples to map a sequence of inputs into a desired sequence of outputs. New hidden units with recurrent connections are added to the network as needed during training. In effect, the network builds up a finite-state machine tailored specifically for the current problem. RCC retains the advantages of Cascade-Correlation: fast learning, good generalization, automatic construction of a near-minimal multi-layered network, and incremental training. Initially the network contains only inputs, output units, and the connections between them.


Learning Time-varying Concepts

Neural Information Processing Systems

This work extends computational learning theory to situations in which concepts vary over time, e.g., system identification of a time-varying plant. We have extended formal definitions of concepts and learning to provide a framework in which an algorithm can track a concept as it evolves over time. Given this framework and focusing on memory-based algorithms, we have derived some PACstyle sample complexity results that determine, for example, when tracking is feasible. We have also used a similar framework and focused on incremental tracking algorithms for which we have derived some bounds on the mistake or error rates for some specific concept classes. 1 INTRODUCTION The goal of our ongoing research is to extend computational learning theory to include concepts that can change or evolve over time. For example, face recognition is complicated by the fact that a persons face changes slowly with age and more quickly with changes in make up, hairstyle, or facial hair.


Statistical Mechanics of Temporal Association in Neural Networks

Neural Information Processing Systems

Basic computational functions of associative neural structures may be analytically studied within the framework of attractor neural networks where static patterns are stored as stable fixed-points for the system's dynamics. If the interactions between single neurons are instantaneous and mediated by symmetric couplings, there is a Lyapunov function for the retrieval dynamics (Hopfield 1982). The global computation corresponds in that case to a downhill motion in an energy landscape created by the stored information. Methods of equilibrium statistical mechanics may be applied and permit a quantitative analysis of the asymptotic network behavior (Amit et al. 1985, 1987). The existence of a Lyapunov function is thus of great conceptual as well as technical importance. Nevertheless, one should be aware that environmental inputs to a neural net always provide information in both space and time. It is therefore desirable to extend the original Hopfield scheme and to explore possibilities for a joint representation of static patterns and temporal associations.


ART2/BP architecture for adaptive estimation of dynamic processes

Neural Information Processing Systems

The goal has been to construct a supervised artificial neural network that learns incrementally an unknown mapping. As a result a network consisting of a combination of ART2 and backpropagation is proposed and is called an "ART2/BP" network. The ART2 network is used to build and focus a supervised backpropagation network. The ART2/BP network has the advantage of being able to dynamically expand itself in response to input patterns containing new information. Simulation results show that the ART2/BP network outperforms a classical maximum likelihood method for the estimation of a discrete dynamic and nonlinear transfer function.


A Theory for Neural Networks with Time Delays

Neural Information Processing Systems

We present a new neural network model for processing of temporal patterns. This model, the gamma neural model, is as general as a convolution delay model with arbitrary weight kernels w(t). We show that the gamma model can be formulated as a (partially prewired) additive model. A temporal hebbian learning rule is derived and we establish links to related existing models for temporal processing. 1 INTRODUCTION In this paper, we are concerned with developing neural nets with short term memory for processing of temporal patterns. In the literature, basically two ways have been reported to incorporate short-term memory in the neural system equations.