An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition

Dec-31-2003–Neural Information Processing Systems

An EM algorithm to train the model is presented, as well as a Viterbi decoder that can be used to obtain theoptimal state sequence as well as the alignment between the two sequences. One such task, which will be presented in this paper, is multimodal speech recognition usingboth a microphone and a camera recording a speaker simultaneously while he (she) speaks. It is indeed well known that seeing the speaker's face in addition tohearing his (her) voice can often improve speech intelligibility, particularly in noisy environments [7), mainly thanks to the complementarity of the visual and acoustic signals. While in the former solution, the alignment between the two sequences is decided a priori, in the latter, there is no explicit learning of the joint probability of the two sequences. In fact, the model enables to desynchronize the streams by temporarily stretching one of them in order to obtain a better match between the corresponding frames.The model can thus be directly applied to the problem of audiovisual speech recognition where sometimes lips start to move before any sound is heard for instance.

artificial intelligence, sequence, speech recognition, (18 more...)

Neural Information Processing Systems

Dec-31-2003

Conferences PDF

Add feedback

Country:
- Europe (0.14)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Learning Graphical Models
    - Undirected Networks > Markov Models (1.00)
  - Speech > Speech Recognition (0.94)

Duplicate Docs Excel Report

Title
An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition
An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition

Similar Docs Excel Report more

Title	Similarity	Source
None found