Cardin, Régis
Speaker Independent Speech Recognition with Neural Networks and Speech Knowledge
Bengio, Yoshua, Mori, Renato de, Cardin, Régis
ABSTRACT We attempt to combine neural networks with knowledge from speech science to build a speaker independent speech recognition system. This knowledge is utilized in designing the preprocessing, input coding, output coding, output supervision and architectural constraints. To handle the temporal aspect of speech we combine delays, copies of activations of hidden and output units at the input level, and Back-Propagation for Sequences (BPS), a learning algorithm for networks with local self-loops. This strategy is demonstrated in several experiments, in particular a nasal discrimination task for which the application of a speech theory hypothesis dramatically improved generalization. 1 INTRODUCTION The strategy put forward in this research effort is to combine the flexibility and learning abilities of neural networks with as much knowledge from speech science as possible in order to build a speaker independent automatic speech recognition system. This knowledge is utilized in each of the steps in the construction of an automated speech recognition system: preprocessing, input coding, output coding, output supervision, architectural design.
Speaker Independent Speech Recognition with Neural Networks and Speech Knowledge
Bengio, Yoshua, Mori, Renato de, Cardin, Régis
Yoshua Bengio Renato De Mori Dept Computer Science Dept Computer Science McGill University McGill University Montreal, Canada H3A2A7 RegisCardin Dept Computer Science McGill University ABSTRACT We attempt to combine neural networks with knowledge from speech science to build a speaker independent speech recognition system.This knowledge is utilized in designing the preprocessing, input coding, output coding, output supervision and architectural constraints. To handle the temporal aspect of speech we combine delays, copies of activations of hidden and output units at the input level, and Back-Propagation for Sequences (BPS), a learning algorithm for networks with local self-loops. This strategy is demonstrated in several experiments, inparticular a nasal discrimination task for which the application of a speech theory hypothesis dramatically improved generalization. 1 INTRODUCTION The strategy put forward in this research effort is to combine the flexibility and learning abilities of neural networks with as much knowledge from speech science as possible in order to build a speaker independent automatic speech recognition system. This knowledge is utilized in each of the steps in the construction ofan automated speech recognition system: preprocessing, input coding, output coding, output supervision, architectural design. Fast Fourier Transform (FFT), or compressing the frame sequence in such a way as to conserve an approximately constant rate of change.
Speaker Independent Speech Recognition with Neural Networks and Speech Knowledge
Bengio, Yoshua, Mori, Renato de, Cardin, Régis
ABSTRACT We attempt to combine neural networks with knowledge from speech science to build a speaker independent speech recognition system. This knowledge is utilized in designing the preprocessing, input coding, output coding, output supervision and architectural constraints. To handle the temporal aspect of speech we combine delays, copies of activations of hidden and output units at the input level, and Back-Propagation for Sequences (BPS), a learning algorithm for networks with local self-loops. This strategy is demonstrated in several experiments, in particular a nasal discrimination task for which the application of a speech theory hypothesis dramatically improved generalization. 1 INTRODUCTION The strategy put forward in this research effort is to combine the flexibility and learning abilities of neural networks with as much knowledge from speech science as possible in order to build a speaker independent automatic speech recognition system. This knowledge is utilized in each of the steps in the construction of an automated speech recognition system: preprocessing, input coding, output coding, output supervision, architectural design.
Use of Multi-Layered Networks for Coding Speech with Phonetic Features
Bengio, Yoshua, Cardin, Régis, Mori, Renato de, Cosi, Piero
A method that combines expertise on neural networks with expertise on speech recognition is used to build the recognition systems. For transient sounds, eventdriven property extractors with variable resolution in the time and frequency domains are used. For sonorant speech, a model of the human auditory system is preferred to FFT as a front-end module. INTRODUCTION Combining a structural or knowledge-based approach for describing speech units with neural networks capable of automatically learning relations between acoustic properties and speech units is the research effort we are attempting.
Use of Multi-Layered Networks for Coding Speech with Phonetic Features
Bengio, Yoshua, Cardin, Régis, Mori, Renato de, Cosi, Piero
A method that combines expertise on neural networks with expertise on speech recognition is used to build the recognition systems. For transient sounds, eventdriven property extractors with variable resolution in the time and frequency domains are used. For sonorant speech, a model of the human auditory system is preferred to FFT as a front-end module. INTRODUCTION Combining a structural or knowledge-based approach for describing speech units with neural networks capable of automatically learning relations between acoustic properties and speech units is the research effort we are attempting.
Use of Multi-Layered Networks for Coding Speech with Phonetic Features
Bengio, Yoshua, Cardin, Régis, Mori, Renato de, Cosi, Piero
McGill University Montreal, Canada H3A2A7 PieroCosi Centro di Studio per Ie Ricerche di Fonetica, C.N.R., Via Oberdan,10, 35122 Padova, Italy ABSTRACT Preliminary results on speaker-independant speech recognition are reported. A method that combines expertise on neural networks with expertise on speech recognition is used to build the recognition systems. For transient sounds, eventdriven propertyextractors with variable resolution in the time and frequency domains are used. For sonorant speech, a model of the human auditory system is preferred to FFT as a front-end module. INTRODUCTION Combining a structural or knowledge-based approach for describing speech units with neural networks capable of automatically learning relations between acoustic properties and speech units is the research effort we are attempting.