Lip to Speech Synthesis with Visual Context Attentional GAN

Neural Information Processing Systems 

Specifically, the proposed VCA-GAN synthesizes the speech from local lip visual features by finding a mapping function of viseme-to-phoneme, while global visual context is embedded into the intermediate layers of the generator to clarify the ambiguity in the mapping induced by homophene.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found