AITopics | linguistic data consortium

Collaborating Authors

linguistic data consortium

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Non-native Children's Automatic Speech Assessment Challenge (NOCASA)

Getman, Yaroslav, Grósz, Tamás, Kurimo, Mikko, Salvi, Giampiero

arXiv.org Artificial IntelligenceAug-14-2025

This paper presents the "Non-native Children's Automatic Speech Assessment" (NOCASA) - a data competition part of the IEEE MLSP 2025 conference. NOCASA challenges participants to develop new systems that can assess single-word pronunciations of young second language (L2) learners as part of a gamified pronunciation training app. To achieve this, several issues must be addressed, most notably the limited nature of available training data and the highly unbalanced distribution among the pronunciation level categories. To expedite the development, we provide a pseudo-anonymized training data (TeflonNorL2), containing 10,334 recordings from 44 speakers attempting to pronounce 205 distinct Norwegian words, human-rated on a 1 to 5 scale (number of stars that should be given in the game). In addition to the data, two already trained systems are released as official baselines: an SVM classifier trained on the ComParE_16 acoustic feature set and a multi-task wav2vec 2.0 model. The latter achieves the best performance on the challenge test set, with an unweighted average recall (UAR) of 36.37%.

artificial intelligence, machine learning, pronunciation, (17 more...)

arXiv.org Artificial Intelligence

2504.20678

Country:

Europe (0.68)
Asia (0.46)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.34)

Add feedback

Enhancing Aviation Communication Transcription: Fine-Tuning Distil-Whisper with LoRA

Mirzaei, Shokoufeh, Arzate, Jesse, Vijay, Yukti

arXiv.org Artificial IntelligenceMar-13-2025

Transcription of aviation communications has several applications, from assisting air traffic controllers in identifying the accuracy of read-back errors to search and rescue operations. Recent advances in artificial intelligence have provided unprecedented opportunities for improving aviation communication transcription tasks. OpenAI's Whisper is one of the leading automatic speech recognition models. However, fine-tuning Whisper for aviation communication transcription is not computationally efficient. Thus, this paper aims to use a Parameter-Efficient Fine-tuning method called Low-Rank Adaptation to fine-tune a more computationally efficient version of Whisper, distil-Whisper. To perform the fine-tuning, we used the Air Traffic Control Corpus dataset from the Linguistic Data Consortium, which contains approximately 70 hours of controller and pilot transmissions near three major airports in the US. The objective was to reduce the word error rate to enhance accuracy in the transcription of aviation communication. First, starting with an initial set of hyperparameters for LoRA (Alpha = 64 and Rank = 32), we performed a grid search. We applied a 5-fold cross-validation to find the best combination of distil-Whisper hyperparameters. Then, we fine-tuned the model for LoRA hyperparameters, achieving an impressive average word error rate of 3.86% across five folds. This result highlights the model's potential for use in the cockpit.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.22692

Country:

North America > United States > Texas > Tarrant County > Fort Worth (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > United States > California > Los Angeles County > Pomona (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Air (1.00)
Transportation > Infrastructure & Services > Airport (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.76)

Add feedback

ZAEBUC-Spoken: A Multilingual Multidialectal Arabic-English Speech Corpus

Hamed, Injy, Eryani, Fadhl, Palfreyman, David, Habash, Nizar

arXiv.org Artificial IntelligenceMar-26-2024

The corpus comprises twelve hours of Zoom meetings involving multiple speakers role-playing a work situation where Students brainstorm ideas for a certain topic and then discuss it with an Interlocutor. The meetings cover different topics and are divided into phases with different language setups. The corpus presents a challenging set for automatic speech recognition (ASR), including two languages (Arabic and English) with Arabic spoken in multiple variants (Modern Standard Arabic, Gulf Arabic, and Egyptian Arabic) and English used with various accents. Adding to the complexity of the corpus, there is also code-switching between these languages and dialects. As part of our work, we take inspiration from established sets of transcription guidelines to present a set of guidelines handling issues of conversational speech, code-switching and orthography of both languages. We further enrich the corpus with two layers of annotations; (1) dialectness level annotation for the portion of the corpus where mixing occurs between different variants of Arabic, and (2) automatic morphological annotations, including tokenization, lemmatization, and part-of-speech tagging.

arabic, corpus, utterance, (16 more...)

arXiv.org Artificial Intelligence

2403.18182

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Oceania > Australia (0.04)
Asia > Indonesia > Bali (0.04)
(10 more...)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Advancing Speech Translation: A Corpus of Mandarin-English Conversational Telephone Speech

Wotherspoon, Shannon, Hartmann, William, Snover, Matthew

arXiv.org Artificial IntelligenceMar-25-2024

This paper introduces a set of English translations for a 123-hour subset of the CallHome Mandarin Chinese data and the HKUST Mandarin Telephone Speech data for the task of speech translation. Paired source-language speech and target-language text is essential for training end-to-end speech translation systems and can provide substantial performance improvements for cascaded systems as well, relative to training on more widely available text data sets. We demonstrate that fine-tuning a general-purpose translation model to our Mandarin-English conversational telephone speech training set improves target-domain BLEU by more than 8 points, highlighting the importance of matched training data.

linguistic data consortium, training data, translation, (9 more...)

arXiv.org Artificial Intelligence

2404.11619

Country:

North America > United States > California > San Francisco County > San Francisco (0.15)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Asia > Middle East > Jordan (0.05)
Asia > China (0.05)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development

Marmor, Yanir, Misgav, Kinneret, Lifshitz, Yair

arXiv.org Artificial IntelligenceJul-17-2023

We introduce "ivrit.ai", a comprehensive Hebrew speech dataset, addressing the distinct lack of extensive, high-quality resources for advancing Automated Speech Recognition (ASR) technology in Hebrew. With over 3,300 speech hours and a over a thousand diverse speakers, ivrit.ai offers a substantial compilation of Hebrew speech across various contexts. It is delivered in three forms to cater to varying research needs: raw unprocessed audio; data post-Voice Activity Detection, and partially transcribed data. The dataset stands out for its legal accessibility, permitting use at no cost, thereby serving as a crucial resource for researchers, developers, and commercial entities. ivrit.ai opens up numerous applications, offering vast potential to enhance AI capabilities in Hebrew. Future efforts aim to expand ivrit.ai further, thereby advancing Hebrew's standing in AI research and technology.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2307.0872

Country:

North America > United States > Pennsylvania (0.04)
Europe > Norway > Central Norway > Trøndelag > Trondheim (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.50)

Industry:

Law (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Exploring Segmentation Approaches for Neural Machine Translation of Code-Switched Egyptian Arabic-English Text

Gaser, Marwa, Mager, Manuel, Hamed, Injy, Habash, Nizar, Abdennadher, Slim, Vu, Ngoc Thang

arXiv.org Artificial IntelligenceApr-30-2023

Data sparsity is one of the main challenges posed by code-switching (CS), which is further exacerbated in the case of morphologically rich languages. For the task of machine translation (MT), morphological segmentation has proven successful in alleviating data sparsity in monolingual contexts; however, it has not been investigated for CS settings. In this paper, we study the effectiveness of different segmentation approaches on MT performance, covering morphology-based and frequency-based segmentation techniques. We experiment on MT from code-switched Arabic-English to English. We provide detailed analysis, examining a variety of conditions, such as data size and sentences with different degrees of CS. Empirical results show that morphology-aware segmenters perform the best in segmentation tasks but under-perform in MT. Nevertheless, we find that the choice of the segmentation setup to use for MT is highly dependent on the data size. For extreme low-resource scenarios, a combination of frequency and morphology-based segmentations is shown to perform the best. For more resourced settings, such a combination does not bring significant improvements over the use of frequency-based segmentation.

machine learning, natural language, segmentation, (16 more...)

arXiv.org Artificial Intelligence

2210.0699

Country:

Europe (1.00)
Asia (1.00)
Africa > Middle East > Egypt (0.28)
North America > United States > Minnesota (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.46)

Industry: Energy > Oil & Gas (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback