Speech Overview

Understanding and Generating Spoken Language

... Simple inquiries about bank balance, movie schedules, and phone call transfers can already be handled by telephone-speech recognizers. ... Voice activated data entry is particularly useful in medical or darkroom applications, where hands and eyes are unavailable, or in hands-busy or eyes-busy command and control applications. Speech could be used to provide more accessibility for the handicapped ... and to create high-tech amenities (intelligent houses, cars, etc.)
- Alex Waibel and Kai-Fu Lee, from Readings in Speech Recognition 

The 1990s saw the first commercialization of spoken language understanding systems. Computers can now understand and react to humans speaking in a natural manner in ordinary languages within a limited domain. Basic and applied research in signal processing, computational linguistics and artificial intelligence have been combined to open up new possibilities in human-computer interfaces.

Definition of the Area

Speech Understanding: "Automatic speech recognition (ASR) is one of the fastest growing and commercially most promising applications of natural language technology. Speech is the most natural communicative medium for humans in many situations, including applications such as giving dictation; querying database or information-retrieval systems; or generally giving commands to a computer or other device, especially in environments where keyboard input is awkward or impossible (for example, because one’s hands are required for other tasks)." From Linguistic Knowledge and Empirical Methods in Speech Recognition. By Andreas Stolcke. (1997). AI Magazine 18 (4): 25-32.

Speech Synthesis: "Synthetic-speech researchers ... have been tackling a much tougher challenge: making computers say anything a live person could say, and in a voice that sounds natural." From Making Computers Talk. Andy Aaron, Ellen Eide and John F. Pitrelli. Scientific American Explore (March 17, 2003). 


