Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

Open in new window