Real Time Voice Processing with Audiovisual Feedback: Toward Autonomous Agents with Perfect Pitch
Saul, Lawrence K., Lee, Daniel D., Isbell, Charles L., Cun, Yann L.
–Neural Information Processing Systems
We have implemented a real time front end for detecting voiced speech and estimating its fundamental frequency. The front end performs the signal processing for voice-driven agents that attend to the pitch contours of human speech and provide continuous audiovisual feedback. The algorithm we use for pitch tracking has several distinguishing features: it makes no use of FFTs or autocorrelation at the pitch period; it updates the pitch incrementally on a sample-by-sample basis; it avoids peak picking and does not require interpolation in time or frequency to obtain high resolution estimates; and it works reliably over a four octave range, in real time, without the need for postprocessing to produce smooth contours. The algorithm is based on two simple ideas in neural computation: the introduction of a purposeful nonlinearity, and the error signal of a least squares fit.
Neural Information Processing Systems
Dec-31-2003
- Country:
- North America
- Cuba (0.04)
- United States
- Georgia > Fulton County
- Atlanta (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New Jersey > Mercer County
- Princeton (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Georgia > Fulton County
- North America
- Industry:
- Leisure & Entertainment (0.68)
- Media > Music (0.68)
- Technology: