AITopics | pronunciation lexicon

Collaborating Authors

pronunciation lexicon

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Investigating Transcription Normalization in the Faetar ASR Benchmark

Peckham, Leo, Ong, Michael, Nagy, Naomi, Dunbar, Ewan

arXiv.org Artificial IntelligenceAug-21-2025

We provide a small but important update on the Faetar Speech Recognition Benchmark [1]. The benchmark, initially released as a challenge task (with test data embargoed), is intended to teach us more about the domain of "dirty" low-resource ASR. We identified two major hurdles. First, due to an unfortunate error, one of the baselines for the constrained ASR task which interested most challenge participants had an incorrect phone error rate which was much lower than it should have been-the reported result in fact came from a different, unconstrained model. We felt the impact of this as potential participants hesitated to submit when they were unable to beat this incorrect number. This has since been corrected in the documentation.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.11771

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.50)

Add feedback

Automatic Text Pronunciation Correlation Generation and Application for Contextual Biasing

Cheng, Gaofeng, Lu, Haitian, Yang, Chengxu, Wang, Xuyang, Li, Ta, Yan, Yonghong

arXiv.org Artificial IntelligenceJan-1-2025

Effectively distinguishing the pronunciation correlations between different written texts is a significant issue in linguistic acoustics. Traditionally, such pronunciation correlations are obtained through manually designed pronunciation lexicons. In this paper, we propose a data-driven method to automatically acquire these pronunciation correlations, called automatic text pronunciation correlation (ATPC). The supervision required for this method is consistent with the supervision needed for training end-to-end automatic speech recognition (E2E-ASR) systems, i.e., speech and corresponding text annotations. First, the iteratively-trained timestamp estimator (ITSE) algorithm is employed to align the speech with their corresponding annotated text symbols. Then, a speech encoder is used to convert the speech into speech embeddings. Finally, we compare the speech embeddings distances of different text symbols to obtain ATPC. Experimental results on Mandarin show that ATPC enhances E2E-ASR performance in contextual biasing and holds promise for dialects or languages lacking artificial pronunciation lexicons.

correlation, pronunciation correlation, pronunciation lexicon, (12 more...)

arXiv.org Artificial Intelligence

2501.00804

Country:

Asia > China (0.05)
South America > Paraguay (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Using Kaldi for Automatic Speech Recognition of Conversational Austrian German

Linke, Julian, Wepner, Saskia, Kubin, Gernot, Schuppler, Barbara

arXiv.org Artificial IntelligenceJan-16-2023

As dialogue systems are becoming more and more interactional and social, also the accurate automatic speech recognition (ASR) of conversational speech is of increasing importance. This shifts the focus from short, spontaneous, task-oriented dialogues to the much higher complexity of casual face-to-face conversations. However, the collection and annotation of such conversations is a time-consuming process and data is sparse for this specific speaking style. This paper presents ASR experiments with read and conversational Austrian German as target. In order to deal with having only limited resources available for conversational German and, at the same time, with a large variation among speakers with respect to pronunciation characteristics, we improve a Kaldi-based ASR system by incorporating a (large) knowledge-based pronunciation lexicon, while exploring different data-based methods to restrict the number of pronunciation variants for each lexical entry. We achieve best WER of 0.4% on Austrian German read speech and best average WER of 48.5% on conversational speech. We find that by using our best pronunciation lexicon a similarly high performance can be achieved than by increasing the size of the data used for the language model by approx. 360% to 760%. Our findings indicate that for low-resource scenarios -- despite the general trend in speech technology towards using data-based methods only -- knowledge-based approaches are a successful, efficient method.

artificial intelligence, speech, speech recognition, (16 more...)

arXiv.org Artificial Intelligence

2301.06475

Country:

Europe > Austria > Styria > Graz (0.05)
Europe > Netherlands > Gelderland > Nijmegen (0.04)
Europe > Germany > Lower Saxony > Oldenburg (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon

Veisi, Hadi, Hosseini, Hawre, Mohammadamini, Mohammad, Fathy, Wirya, Mahmudi, Aso

arXiv.org Artificial IntelligenceFeb-15-2021

In this paper, we introduce the first large vocabulary speech recognition system (LVSR) for the Central Kurdish language, named Jira. The Kurdish language is an Indo-European language spoken by more than 30 million people in several countries, but due to the lack of speech and text resources, there is no speech recognition system for this language. To fill this gap, we introduce the first speech corpus and pronunciation lexicon for the Kurdish language. Regarding speech corpus, we designed a sentence collection in which the ratio of di-phones in the collection resembles the real data of the Central Kurdish language. The designed sentences are uttered by 576 speakers in a controlled environment with noise-free microphones (called AsoSoft Speech-Office) and in Telegram social network environment using mobile phones (denoted as AsoSoft Speech-Crowdsourcing), resulted in 43.68 hours of speech. Besides, a test set including 11 different document topics is designed and recorded in two corresponding speech conditions (i.e., Office and Crowdsourcing). Furthermore, a 60K pronunciation lexicon is prepared in this research in which we faced several challenges and proposed solutions for them. The Kurdish language has several dialects and sub-dialects that results in many lexical variations. Our methods for script standardization of lexical variations and automatic pronunciation of the lexicon tokens are presented in detail. To setup the recognition engine, we used the Kaldi toolkit. A statistical tri-gram language model that is extracted from the AsoSoft text corpus is used in the system. Several standard recipes including HMM-based models (i.e., mono, tri1, tr2, tri2, tri3), SGMM, and DNN methods are used to generate the acoustic model. These methods are trained with AsoSoft Speech-Office and AsoSoft Speech-Crowdsourcing and a combination of them. The best performance achieved by the SGMM acoustic model which results in 13.9% of the average word error rate (on different document topics) and 4.9% for the general topic.

central kurdish, corpus, kurdish, (12 more...)

arXiv.org Artificial Intelligence

2102.07412

Country:

Asia > Middle East > Iraq > Kurdistan Region (0.05)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Asia > Middle East > Syria (0.04)
(8 more...)

Genre: Research Report (0.82)

Industry:

Government (0.46)
Media (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback