Waibel, Alex
JANUS: Speech-to-Speech Translation Using Connectionist and Non-Connectionist Techniques
Waibel, Alex, Jain, Ajay N., McNair, Arthur E., Tebelskis, Joe, Osterholtz, Louise, Saito, Hiroaki, Schmidbauer, Otto, Sloboda, Tilo, Woszczyna, Monika
JANUS translates continuously spoken English and German into German, English, and Japanese. JANUS currently achieves 87% translation fidelity from English speech and 97% from German speech. We present the JANUS system along with comparative evaluations of its interchangeable processing components, with special emphasis on the connectionist modules.
Continuous Speech Recognition by Linked Predictive Neural Networks
Tebelskis, Joe, Waibel, Alex, Petek, Bojan, Schmidbauer, Otto
We present a large vocabulary, continuous speech recognition system based on Linked Predictive Neural Networks (LPNN's). The system uses neural networks as predictors of speech frames, yielding distortion measures which are used by the One Stage DTW algorithm to perform continuous speech recognition. The system, already deployed in a Speech to Speech Translation system, currently achieves 95%, 58%, and 39% word accuracy on tasks with perplexity 5, 111, and 402 respectively, outperforming several simple HMMs that we tested. We also found that the accuracy and speed of the LPNN can be slightly improved by the judicious use of hidden control inputs. We conclude by discussing the strengths and weaknesses of the predictive approach.
The Tempo 2 Algorithm: Adjusting Time-Delays By Supervised Learning
Bodenhausen, Ulrich, Waibel, Alex
In this work we describe a new method that adjusts time-delays and the widths of time-windows in artificial neural networks automatically. The input of the units are weighted by a gaussian input-window over time which allows the learning rules for the delays and widths to be derived in the same way as it is used for the weights. Our results on a phoneme classification task compare well with results obtained with the TDNN by Waibel et al., which was manually optimized for the same task.
Continuous Speech Recognition by Linked Predictive Neural Networks
Tebelskis, Joe, Waibel, Alex, Petek, Bojan, Schmidbauer, Otto
We present a large vocabulary, continuous speech recognition system based on Linked Predictive Neural Networks (LPNN's). The system uses neural networks as predictors of speech frames, yielding distortion measures which are used by the One Stage DTW algorithm to perform continuous speech recognition. The system, already deployed in a Speech to Speech Translation system, currently achieves 95%, 58%, and 39% word accuracy on tasks with perplexity 5, 111, and 402 respectively, outperforming several simple HMMs that we tested. We also found that the accuracy and speed of the LPNN can be slightly improved by the judicious use of hidden control inputs. We conclude by discussing the strengths and weaknesses of the predictive approach.
Continuous Speech Recognition by Linked Predictive Neural Networks
Tebelskis, Joe, Waibel, Alex, Petek, Bojan, Schmidbauer, Otto
We present a large vocabulary, continuous speech recognition system based on Linked Predictive Neural Networks (LPNN's). The system uses neural networksas predictors of speech frames, yielding distortion measures which are used by the One Stage DTW algorithm to perform continuous speech recognition. The system, already deployed in a Speech to Speech Translation system, currently achieves 95%, 58%, and 39% word accuracy on tasks with perplexity 5, 111, and 402 respectively, outperforming several simpleHMMs that we tested. We also found that the accuracy and speed of the LPNN can be slightly improved by the judicious use of hidden control inputs. We conclude by discussing the strengths and weaknesses of the predictive approach.
The Tempo 2 Algorithm: Adjusting Time-Delays By Supervised Learning
Bodenhausen, Ulrich, Waibel, Alex
In this work we describe a new method that adjusts time-delays and the widths of time-windows in artificial neural networks automatically. The input of the units are weighted by a gaussian input-window over time which allows the learning rules for the delays and widths to be derived in the same way as it is used for the weights. Our results on a phoneme classification task compare well with results obtained with the TDNN by Waibel et al., which was manually optimized for the same task.
The Tempo 2 Algorithm: Adjusting Time-Delays By Supervised Learning
Bodenhausen, Ulrich, Waibel, Alex
In this work we describe a new method that adjusts time-delays and the widths of time-windows in artificial neural networks automatically. The input of the units are weighted by a gaussian input-window over time which allows the learning rules for the delays and widths to be derived in the same way as it is used for the weights. Our results on a phoneme classification task compare well with results obtained with the TDNN by Waibel et al., which was manually optimized for the same task.
Incremental Parsing by Modular Recurrent Connectionist Networks
Jain, Ajay N., Waibel, Alex
We present a novel, modular, recurrent connectionist network architecture which learns to robustly perform incremental parsing of complex sentences. From sequential input, one word at a time, our networks learn to do semantic role assignment, noun phrase attachment, and clause structure recognition for sentences with passive constructions and center embedded clauses. The networks make syntactic and semantic predictions at every point in time, and previous predictions are revised as expectations are affirmed or violated with the arrival of new information. Our networks induce their own "grammar rules" for dynamically transforming an input sequence of words into a syntactic/semantic interpretation. These networks generalize and display tolerance to input which has been corrupted in ways common in spoken language.