Lip to Speech Synthesis with Visual Context Attentional GAN
–Neural Information Processing Systems
Specifically, the proposed VCA-GAN synthesizes the speech from local lip visual features by finding a mapping function of viseme-to-phoneme, while global visual context is embedded into the intermediate layers of the generator to clarify the ambiguity in the mapping induced by homophene.
Neural Information Processing Systems
Oct-2-2025, 13:24:12 GMT
- Country:
- Asia > Myanmar
- Tanintharyi Region > Dawei (0.04)
- Europe > Italy
- Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States (0.04)
- South America > Chile
- Asia > Myanmar
- Genre:
- Research Report (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (0.93)
- Natural Language (1.00)
- Speech (1.00)
- Vision (1.00)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence