AITopics | pronunciation model

Collaborating Authors

pronunciation model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CMU's ASR2K Pipeline Recognizes Speech in 1909 Languages Without Audio

#artificialintelligenceSep-15-2022, 05:03:58 GMT

AI-powered speech recognition systems have made great progress in recent years, with speech-to-text processing now so powerful that the occasional errors are little more than curious exceptions. Most contemporary models addressing this task however require massive labelled training data -- which is simple enough to source for English, Chinese, and other popular languages but challenging to obtain in the case of the low-resource tongues that make up the majority of the world's 8,000 languages. To address this issue, a Carnegie Mellon University research team has developed a speech recognition pipeline that can recognize 1909 languages without any audio for the target language. Their ASR2K pipeline achieves impressive 45 percent CER and 69 percent WER scores when using 10,000 raw text utterances on the CMU Wilderness dataset, and is introduced in the paper ASR2K: Speech Recognition for Around 2000 Languages Without Audio. The proposed pipeline comprises separate acoustic, pronunciation, and language models.

asr2k pipeline recognize speech, language model, pronunciation model, (9 more...)

#artificialintelligence

Country: Asia > South Korea > Incheon > Incheon (0.06)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

ASR2K: Speech Recognition for Around 2000 Languages without Audio

Li, Xinjian, Metze, Florian, Mortensen, David R, Black, Alan W, Watanabe, Shinji

arXiv.org Artificial IntelligenceSep-6-2022

Most recent speech recognition models rely on large supervised datasets, which are unavailable for many low-resource languages. In this work, we present a speech recognition pipeline that does not require any audio for the target language. The only assumption is that we have access to raw text datasets or a set of n-gram statistics. Our speech pipeline consists of three components: acoustic, pronunciation, and language models. Unlike the standard pipeline, our acoustic and pronunciation models use multilingual models without any supervision. The language model is built using n-gram statistics or the raw text dataset. We build speech recognition for 1909 languages by combining it with Crubadan: a large endangered languages n-gram database. Furthermore, we test our approach on 129 languages across two datasets: Common Voice and CMU Wilderness dataset. We achieve 50% CER and 74% WER on the Wilderness dataset with Crubadan statistics only and improve them to 45% CER and 69% WER when using 10000 raw text utterances.

dataset, pronunciation model, recognition, (15 more...)

arXiv.org Artificial Intelligence

2209.02842

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Africa > Sub-Saharan Africa (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

The 3 Deep Learning Frameworks For End-to-End Speech Recognition That Power Your Devices

#artificialintelligenceSep-21-2019, 18:52:41 GMT

Speech recognition is invading our lives. It's built into our phones (Siri), our game consoles (Kinect), our smartwatches (Apple Watch), and even our homes (Amazon Echo). But speech recognition has been around for decades, so why is it just now hitting the mainstream? The reason is that deep learning finally made speech recognition accurate enough to be useful outside of carefully-controlled environments. In this blog post, we'll learn how to perform speech recognition with 3 different implementations of popular deep learning frameworks.

deep learning framework, end-to-end speech recognition, sequence, (4 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)

Add feedback