An Alternative to Low-level-Sychrony-Based Methods for Speech Detection

Dec-31-2010–Neural Information Processing Systems

Determining whether someone is talking has applications in many areas such as speech recognition, speaker diarization, social robotics, facial expression recognition, andhuman computer interaction. One popular approach to this problem is audiovisual synchrony detection [10, 21, 12]. A candidate speaker is deemed to be talking if the visual signal around that speaker correlates with the auditory signal. Here we show that with the proper visual features (in this case movements of various facial muscle groups), a very accurate detector of speech can be created thatdoes not use the audio signal at all. Further we show that this person independent visual-only detector can be used to train very accurate audio-based person dependent voice models. The voice model has the advantage of being able to identify when a particular person is speaking even when they are not visible to the camera (e.g. in the case of a mobile robot). Moreover, we show that a simple sensory fusion scheme between the auditory and visual models improves performance onthe task of talking detection. The work here provides dramatic evidence about the efficacy of two very different approaches to multimodal speech detection on a challenging database.

artificial intelligence, detector, educational technology, (18 more...)

Neural Information Processing Systems

Dec-31-2010

Conferences PDF

Add feedback

Country:
- North America > United States > California > San Diego County (0.14)

Industry:
- Education (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Learning Graphical Models (0.93)
  - Representation & Reasoning (1.00)
  - Robots (1.00)
  - Vision > Face Recognition (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found