Meta's Voicebox AI is a Dall-E for text-to-speech

Engadget 

Today, we are one step closer to the immortal celebrity future we have long been promised (since April). Meta has unveiled Voicebox, its generative text-to-speech model that promises to do for the spoken word what ChatGPT and Dall-E, respectfully, did for text and image generation. Essentially, its a text-to-output generator just like GPT or Dall-E -- just instead of creating prose or pretty pictures, it spits out audio clips. Meta defines the system as "a non-autoregressive flow-matching model trained to infill speech, given audio context and text." It's been trained on more than 50,000 hours of unfiltered audio.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found