AITopics | vowel

Collaborating Authors

vowel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

f74054328beeb0c21a9b8e99da557f5a-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 08:25:47 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.95)
Questionnaire & Opinion Survey (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Morris Alper and Hadar A verbuch-Elor Tel Aviv University Contents

Neural Information Processing SystemsFeb-18-2026, 00:41:01 GMT

Unless stated otherwise we use American English for transcriptions.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.40)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.95)
Questionnaire & Opinion Survey (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

A Rhythm-Aware Phrase Insertion for Classical Arabic Poetry Composition

Elzohbi, Mohamad, Zhao, Richard

arXiv.org Artificial IntelligenceDec-9-2025

This paper presents a methodology for inserting phrases in Arabic poems to conform to a specific rhythm using ByT5, a byte-level multilingual transformer-based model. Our work discusses a rule-based grapheme-to-beat transformation tailored for extracting the rhythm from fully diacritized Arabic script. Our approach employs a conditional denoising objective to fine-tune ByT5, where the model reconstructs masked words to match a target rhythm. We adopt a curriculum learning strategy, pre-training on a general Arabic dataset before fine-tuning on poetic dataset, and explore cross-lingual transfer from English to Arabic. Experimental results demonstrate that our models achieve high rhythmic alignment while maintaining semantic coherence. The proposed model has the potential to be used in co-creative applications in the process of composing classical Arabic poems.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2025.arabicnlp-main.15

2509.18514

Country: North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.15)

Genre:

Research Report > New Finding (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sperm whales use vowels like humans, new study finds

Scientists decoding whale clicks found patterns that echo the building blocks of human speech. The marine mammals have a complex communication system that scientists are working to decode. Breakthroughs, discoveries, and DIY tips sent every weekday. A new study discovered a fresh component of their various vocalizations and could hint at potential language structures. Sperm whales exhibit patterns similar to human vowels and diphthongs-a connected pair of vowels in a word, such as the "oi" in .

artificial intelligence, sperm whale, vowel, (12 more...)

Popular Science

Country: North America > United States > California (0.16)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence (0.50)

Add feedback

Dynamical model parameters from ultrasound tongue kinematics

Kirkham, Sam, Strycharczuk, Patrycja

arXiv.org Artificial IntelligenceNov-5-2025

A common approach is to cast this problem in terms of a dynamical system with point attractor dynamics, where a small number of parameters drive the vocal tract to a stable equilibrium position (Browman and Goldstein, 1986; Fowler, 1980; Gafos, 2006; Saltzman and Munhall, 1989; Tilsen, 2016). A standard model in this framework is the linear harmonic oscillator, m x + b x + kx = 0 (1) where m is mass (typically m = 1), k is a stiffness coefficient, and b is a damping coefficient, usually set to critically damped b = 2 mk. Gestural activation can be governed by step activation, with gestural parameters changing instantaneously at the point of activation and remaining constant over the activation interval. In this study we focus on whether the parameters of a linear harmonic oscillator can be estimated from ultrasound tongue imaging data, which we compare with the more common method of fitting to electromagnetic articulography (EMA) data. A major barrier to this goal is that the linear harmonic oscillator is known to be a poor fit to empirical articulatory trajectories, as it predicts overly short time-to-peak velocity, meaning that it is inappropriate for evaluating how the model can fit different data modalities. There are three common solutions to this issue. The first allows gestural activation to vary over time (Byrd and Saltzman, 1998), which adds extrinsic complexity to the model. The second is a nonlinear model, such as adding a cubic term to the linear model (Kirkham, 2025b; 2 Sorensen and Gafos, 2016), or novel nonlinear models (Stern and Shaw, 2025). The third is to abandon oscillatory models and develop new time-dependent (i.e.

artificial intelligence, machine learning, ultrasound, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1121/10.0039769

2510.18629

Country: Europe > United Kingdom (0.14)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

AI and the End of Accents

WIREDOct-27-2025, 10:00:00 GMT

I sound Korean--because I am Korean. Can AI make me sound American? It all began, as these things often do, with an Instagram ad . "No one tells you this if you're an immigrant, but accent discrimination is a real thing," said a woman in the video. Her own accent is faintly Eastern European--so subtle it took me a few playbacks to notice.

boldvoice, bytedance, pronounce, (15 more...)

WIRED

Country:

Asia > China (0.16)
North America > United States > Ohio (0.05)
North America > United States > New York (0.05)
(8 more...)

Industry:

Government > Regional Government (0.49)
Government > Immigration & Customs (0.35)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.73)
Information Technology > Communications > Social Media (0.71)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

IASC: Interactive Agentic System for ConLangs

Taguchi, Chihiro, Sproat, Richard

arXiv.org Artificial IntelligenceOct-22-2025

We present a system that uses LLMs as a tool in the development of Constructed Languages. The system is modular in that one first creates a target phonology for the language using an agentic approach that refines its output at each step with commentary feedback on its previous attempt. Next, a set of sentences is 'translated' from their English original into a morphosyntactic markup that reflects the word order and morphosyntactic feature specifications of the desired target language, with affixes represented as morphosyntactic feature bundles. From this translated corpus, a lexicon is constructed using the phonological model and the set of morphemes (stems and affixes) extracted from the 'translated' sentences. The system is then instructed to provide an orthography for the language, using an existing script such as Latin or Cyrillic. Finally, the system writes a brief grammatical handbook of the language. The system can also translate further sentences into the target language. Our goal is twofold. First, we hope that these tools will be fun to use for creating artificially constructed languages. Second, we are interested in exploring what LLMs 'know' about language-not what they know about any particular language or linguistic phenomenon, but how much they know about and understand language and linguistic concepts. As we shall see, there is a fairly wide gulf in capabilities both among different LLMs and among different linguistic specifications, with it being notably easier for systems to deal with more common patterns than rarer ones. An additional avenue that we explore is the application of our approach to translating from high-resource into low-resource languages. While the results so far are mostly negative, we provide some evidence that an improved version of the present system could afford some real gains in such tasks. https://github.com/SakanaAI/IASC

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.07591

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Massachusetts (0.27)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

I Have No Mouth, and I Must Rhyme: Uncovering Internal Phonetic Representations in LLaMA 3.2

McLaughlin, Oliver, Khurana, Arjun, Merullo, Jack

arXiv.org Artificial IntelligenceOct-16-2025

Large language models demonstrate proficiency on phonetic tasks, such as rhyming, without explicit phonetic or auditory grounding. In this work, we investigate how \verb|Llama-3.2-1B-Instruct| represents token-level phonetic information. Our results suggest that Llama uses a rich internal model of phonemes to complete phonetic tasks. We provide evidence for high-level organization of phoneme representations in its latent space. In doing so, we also identify a ``phoneme mover head" which promotes phonetic information during rhyming tasks. We visualize the output space of this head and find that, while notable differences exist, Llama learns a model of vowels similar to the standard IPA vowel chart for humans, despite receiving no direct supervision to do so.

large language model, machine learning, phonetic information, (19 more...)

arXiv.org Artificial Intelligence

2508.02527

Country: Europe (0.46)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Add feedback

Linguistically Informed Tokenization Improves ASR for Underresourced Languages

Daul, Massimo, Tosolini, Alessio, Bowern, Claire

arXiv.org Artificial IntelligenceOct-9-2025

Automatic speech recognition (ASR) is a crucial tool for linguists aiming to perform a variety of language documentation tasks. However, modern ASR systems use data-hungry transformer architectures, rendering them generally unusable for underresourced languages. We fine-tune a wav2vec2 ASR model on Yan-nhangu, a dormant Indigenous Australian language, comparing the effects of phonemic and orthographic tokenization strategies on performance. In parallel, we explore ASR's viability as a tool in a language documentation pipeline. We find that a linguistically informed phonemic tokenization system substantially improves WER and CER compared to a baseline orthographic tokenization scheme. Finally, we show that hand-correcting the output of an ASR model is much faster than hand-transcribing audio from scratch, demonstrating that ASR can work for underresourced languages.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.06461

Country:

Europe (0.46)
North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties

Tuttösí, Paige, Yeung, H. Henny, Wang, Yue, Aucouturier, Jean-Julien, Lim, Angelica

arXiv.org Artificial IntelligenceSep-4-2025

We present the first text-to-speech (TTS) system tailored to second language (L2) speakers. We use duration differences between American English tense (longer) and lax (shorter) vowels to create a "clarity mode" for Matcha-TTS. Our perception studies showed that French-L1, English-L2 listeners the participants had fewer (at least 9.15%) transcription errors when using our clarity mode, and found it more encouraging and respectful than overall slowed down speech. Remarkably, listeners were not aware of these effects: despite the decreased word error rate in clarity mode, listeners still believed that slowing all target words was the most intelligible, suggesting that actual intelligibility does not correlate with perceived intelligibility. Additionally, we found that Whisper-ASR did not use the same cues as L2 speakers to differentiate difficult vowels and is not sufficient to assess the intelligibility of TTS systems for these individuals.

machine learning, natural language, vowel, (17 more...)

arXiv.org Artificial Intelligence

2506.23367

Country: North America > Canada (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.35)

Add feedback