Audio-Visual Sound Separation Via Hidden Markov Models
Hershey, John R., Casey, Michael
–Neural Information Processing Systems
It is well known that under noisy conditions we can hear speech much more clearly when we read the speaker's lips. This suggests theutility of audiovisual information for the task of speech enhancement. We propose a method to exploit audiovisual cues to enable speech separation under non-stationary noise and with a single microphone. We revise and extend HMM-based speech enhancement techniques, in which signal and noise models are factori allycombined, to incorporate visual lip information and employ novelsignal HMMs in which the dynamics of narrow-band and wide band components are factorial. We avoid the combinatorial explosionin the factorial model by using a simple approximate inference technique to quickly estimate the clean signals in a mixture. We present a preliminary evaluation of this approach using a small-vocabulary audiovisual database, showing promising improvements in machine intelligibility for speech enhanced using audio and visual information.
Neural Information Processing Systems
Dec-31-2002
- Country:
- North America > United States > California (0.28)
- Industry:
- Automobiles & Trucks (0.47)
- Technology: