Analysis of Visual Features for Continuous Lipreading in Spanish

Gimeno-Gómez, David, Martínez-Hinarejos, Carlos-D.

arXiv.org Artificial Intelligence 

In our case, we employed a traditional approach to define the automatic system, in other words, a system based on Hidden During a conversation, our brain is responsible for combining Markov Models combined with Gaussian Mixture Models information obtained from multiple senses in order (GMM-HMM), an approach that has been widely used in to improve our ability to understand the message we are Acoustic Speech Recognition (ASR) [6]. Although this is not perceiving. Different studies have shown the importance of the state-of-the-art for speech-related signal recognition, it is presenting visual information in these situations. Nevertheless, an appropriate option for comparing the different possibilities lipreading is a complex task whose objective is to interpret for feature extraction. Unlike in ASR, when we deal with Visual speech when audio is not available. By dispensing with a sense Speech Recognition (VSR) our basic speech unit is not the as crucial as hearing, it will be necessary to be aware of the phoneme, but the one known as the viseme, which is associated challenge that this lack presents. In this paper, we propose an with the representation of the phoneme on the visual domain analysis of different speech visual features with the intention [7]. Unfortunately, there is not direct or one-to-one correspondence of identifying which of them is the best approach to capture between them, which causes visual ambiguities.