Facebook details wav2vec, an AI algorithm that uses raw audio to improve speech recognition


Automatic speech recognition, or ASR, is a foundational part of not only assistants like Apple's Siri, but dictation software such as Nuance's Dragon and customer support platforms like Google's Contact Center AI. It's the thing that enables machines to parse utterances for key phrases and words and that allows them to distinguish people by their intonations and pitches. Perhaps it goes without saying that ASR is an intense area of study for Facebook, whose conversational tech is used to power Portal's speech recognition and who is broadening the use of AI to classify content on its platform. To this end, at the InterSpeech conference earlier this year the Menlo Park company detailed wave2vec, a novel machine learning algorithm that improves ASR accuracy by using raw, untranscribed audio as training data. Facebook claims it achieves state-of-the-art results on a popular benchmark while using two orders of magnitude less training data and that it demonstrates a 22% error reduction over the leading character-based speech recognition system, Deep Speech 2. Wav2vec was made available earlier this year as an extension to the open source modeling toolkit fairseq, and Facebook says it plans to use wav2vec to provide better audio data representations for keyword spotting and acoustic event detection.

Duplicate Docs Excel Report

None found

Similar Docs  Excel Report  more

None found