Meta claims its AI improves speech recognition quality by reading lips

#artificialintelligence 

People perceive speech both by listening to it and watching the lip movements of speakers. In fact, studies show that visual cues play a key role in language learning. By contrast, AI speech recognition systems are built mostly -- or entirely -- on audio. And they require a substantial amount of data to train, typically ranging in the tens of thousands of hours of recordings. To investigate whether visuals -- specifically footage of mouth movement -- can improve the performance of speech recognition systems, researchers at Meta (formerly Facebook) developed Audio-Visual Hidden Unit BERT (AV-HuBERT), a framework that learns to understand speech by both watching and hearing people speak.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found