Facebook's voice synthesis AI generates speech in 500 milliseconds

May-16-2020, 19:52:50 GMT–#artificialintelligence

Facebook today unveiled a highly efficient, AI text-to-speech (TTS) system that can be hosted in real time using regular processors. In tandem with a new data collection approach, which leverages a language model for curation, Facebook says the system -- which produces a second of audio in 500 milliseconds -- enabled it to create a British-accented voice in six months as opposed to over a year for previous voices. Most modern AI TTS systems require graphics cards, field-programmable gate arrays (FPGAs), or custom-designed AI chips like Google's tensor processing units (TPUs) to run, train, or both. For instance, a recently detailed Google AI system was trained across 32 TPUs in parallel. Synthesizing a single second of humanlike audio can require outputting as many as 24,000 samples -- sometimes even more.

artificial intelligence, facebook, machine learning, (17 more...)

#artificialintelligence

May-16-2020, 19:52:50 GMT

News Web Page

Add feedback

Industry:
- Information Technology > Services (0.68)

Technology:
- Information Technology
  - Communications > Social Media (1.00)
  - Artificial Intelligence
    - Machine Learning > Neural Networks (0.50)
    - Speech > Speech Synthesis (0.36)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found