Hennecke, Marcus
Lipreading by neural networks: Visual preprocessing, learning, and sensory integration
Wolff, Gregory J., Prasad, K. Venkatesh, Stork, David G., Hennecke, Marcus
Automated speech recognition is notoriously hard, and thus any predictive source of information and constraints that could be incorporated into a computer speech recognition system would be desirable. Humans, especially the hearing impaired, can utilize visual information - "speech reading" - for improved accuracy (Dodd & Campbell, 1987, Sanders & Goodrich, 1971). Speech reading can provide direct information about segments, phonemes, rate, speaker gender and identity, and subtle informationfor segmenting speech from background noise or multiple speakers (De Filippo & Sims, 1988, Green & Miller, 1985). Fundamental support for the use of visual information comes from the complementary natureof the visual and acoustic speech signals. Utterances that are difficult to distinguish acoustically are the easiest to distinguish.
Lipreading by neural networks: Visual preprocessing, learning, and sensory integration
Wolff, Gregory J., Prasad, K. Venkatesh, Stork, David G., Hennecke, Marcus
Automated speech recognition is notoriously hard, and thus any predictive source of information and constraints that could be incorporated into a computer speech recognition system would be desirable. Humans, especially the hearing impaired, can utilize visual information - "speech reading" - for improved accuracy (Dodd & Campbell, 1987, Sanders & Goodrich, 1971). Speech reading can provide direct information about segments, phonemes, rate, speaker gender and identity, and subtle information for segmenting speech from background noise or multiple speakers (De Filippo & Sims, 1988, Green & Miller, 1985). Fundamental support for the use of visual information comes from the complementary nature of the visual and acoustic speech signals. Utterances that are difficult to distinguish acoustically are the easiest to distinguish.