Goto

Collaborating Authors

 Fanty, Mark


English Alphabet Recognition with Telephone Speech

Neural Information Processing Systems

The English alphabet is difficult to recognize automatically because many letters sound alike; e.g., BID, PIT, VIZ and F IS. When spoken over the telephone, the information needed to discriminate among several of these pairs, such as F IS, PIT, BID and VIZ, is further reduced due to the limited bandwidth of the channel Speaker-independent recognition of spelled names over the telephone is difficult due to variability caused by channel distortions, different handsets, and a variety of background noises. Finally, when dealing with a large population of speakers, dialect and foreign accents alter letter pronunciations. An R from a Boston speaker may not contain an [r]. Human classification performance on telephone speech underscores the difficulty of the problem.


English Alphabet Recognition with Telephone Speech

Neural Information Processing Systems

Mark Fanty, Ronald A. Cole and Krist Roginski Center for Spoken Language Understanding Oregon Graduate Institute of Science and Technology 19600 N.W. Von Neumann Dr., Beaverton, OR 97006 Abstract A recognition system is reported which recognizes names spelled over the telephone with brief pauses between letters. The system uses separate neural networks to locate segment boundaries and classify letters. The letter scores are then used to search a database of names to find the best scoring name. The speaker-independent classification rate for spoken letters is89%. The system retrieves the correct name, spelled with pauses between letters, 91 % of the time from a database of 50,000 names. 1 INTRODUCTION The English alphabet is difficult to recognize automatically because many letters sound alike; e.g., BID, PIT, VIZ and F IS.


Spoken Letter Recognition

Neural Information Processing Systems

Through the use of neural network classifiers and careful feature selection, we have achieved high-accuracy speaker-independent spoken letter recognition. For isolated letters, a broad-category segmentation is performed Location of segment boundaries allows us to measure features at specific locations in the signal such as vowel onset, where important information resides. Letter classification is performed with a feed-forward neural network. Recognition accuracy on a test set of 30 speakers was 96%. Neural network classifiers are also used for pitch tracking and broad-category segmentation of letter strings.


Spoken Letter Recognition

Neural Information Processing Systems

Through the use of neural network classifiers and careful feature selection, we have achieved high-accuracy speaker-independent spoken letter recognition. For isolated letters, a broad-category segmentation is performed Location of segment boundaries allows us to measure features at specific locations in the signal such as vowel onset, where important information resides. Letter classification is performed with a feed-forward neural network. Recognition accuracy on a test set of 30 speakers was 96%. Neural network classifiers are also used for pitch tracking and broad-category segmentation of letter strings.


Spoken Letter Recognition

Neural Information Processing Systems

Through the use of neural network classifiers and careful feature selection, we have achieved high-accuracy speaker-independent spoken letter recognition. Forisolated letters, a broad-category segmentation is performed Location of segment boundaries allows us to measure features at specific locations in the signal such as vowel onset, where important information resides. Letter classification is performed with a feed-forward neural network. Recognitionaccuracy on a test set of 30 speakers was 96%. Neural network classifiers are also used for pitch tracking and broad-category segmentation of letter strings.