Zero-shot personalized lip-to-speech synthesis with face image based voice control

Open in new window