vowel
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.40)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.95)
- Questionnaire & Opinion Survey (0.69)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
A Rhythm-Aware Phrase Insertion for Classical Arabic Poetry Composition
Elzohbi, Mohamad, Zhao, Richard
This paper presents a methodology for inserting phrases in Arabic poems to conform to a specific rhythm using ByT5, a byte-level multilingual transformer-based model. Our work discusses a rule-based grapheme-to-beat transformation tailored for extracting the rhythm from fully diacritized Arabic script. Our approach employs a conditional denoising objective to fine-tune ByT5, where the model reconstructs masked words to match a target rhythm. We adopt a curriculum learning strategy, pre-training on a general Arabic dataset before fine-tuning on poetic dataset, and explore cross-lingual transfer from English to Arabic. Experimental results demonstrate that our models achieve high rhythmic alignment while maintaining semantic coherence. The proposed model has the potential to be used in co-creative applications in the process of composing classical Arabic poems.
- North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.15)
- North America > United States > Indiana (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Research Report > New Finding (0.34)
- Overview > Innovation (0.34)
Sperm whales use vowels like humans, new study finds
Scientists decoding whale clicks found patterns that echo the building blocks of human speech. The marine mammals have a complex communication system that scientists are working to decode. Breakthroughs, discoveries, and DIY tips sent every weekday. A new study discovered a fresh component of their various vocalizations and could hint at potential language structures. Sperm whales exhibit patterns similar to human vowels and diphthongs-a connected pair of vowels in a word, such as the "oi" in .
- South America > Brazil (0.05)
- North America > United States > California > Alameda County > Berkeley (0.05)
- North America > Dominica (0.05)
- (2 more...)
Dynamical model parameters from ultrasound tongue kinematics
Kirkham, Sam, Strycharczuk, Patrycja
A common approach is to cast this problem in terms of a dynamical system with point attractor dynamics, where a small number of parameters drive the vocal tract to a stable equilibrium position (Browman and Goldstein, 1986; Fowler, 1980; Gafos, 2006; Saltzman and Munhall, 1989; Tilsen, 2016). A standard model in this framework is the linear harmonic oscillator, m x + b x + kx = 0 (1) where m is mass (typically m = 1), k is a stiffness coefficient, and b is a damping coefficient, usually set to critically damped b = 2 mk. Gestural activation can be governed by step activation, with gestural parameters changing instantaneously at the point of activation and remaining constant over the activation interval. In this study we focus on whether the parameters of a linear harmonic oscillator can be estimated from ultrasound tongue imaging data, which we compare with the more common method of fitting to electromagnetic articulography (EMA) data. A major barrier to this goal is that the linear harmonic oscillator is known to be a poor fit to empirical articulatory trajectories, as it predicts overly short time-to-peak velocity, meaning that it is inappropriate for evaluating how the model can fit different data modalities. There are three common solutions to this issue. The first allows gestural activation to vary over time (Byrd and Saltzman, 1998), which adds extrinsic complexity to the model. The second is a nonlinear model, such as adding a cubic term to the linear model (Kirkham, 2025b; 2 Sorensen and Gafos, 2016), or novel nonlinear models (Stern and Shaw, 2025). The third is to abandon oscillatory models and develop new time-dependent (i.e.
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
- Research Report > Experimental Study (0.46)
- Research Report > New Finding (0.34)
AI and the End of Accents
I sound Korean--because I am Korean. Can AI make me sound American? It all began, as these things often do, with an Instagram ad . "No one tells you this if you're an immigrant, but accent discrimination is a real thing," said a woman in the video. Her own accent is faintly Eastern European--so subtle it took me a few playbacks to notice.
- Asia > China (0.16)
- North America > United States > Ohio (0.05)
- North America > United States > New York (0.05)
- (8 more...)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.73)
- Information Technology > Communications > Social Media (0.71)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)
IASC: Interactive Agentic System for ConLangs
Taguchi, Chihiro, Sproat, Richard
We present a system that uses LLMs as a tool in the development of Constructed Languages. The system is modular in that one first creates a target phonology for the language using an agentic approach that refines its output at each step with commentary feedback on its previous attempt. Next, a set of sentences is 'translated' from their English original into a morphosyntactic markup that reflects the word order and morphosyntactic feature specifications of the desired target language, with affixes represented as morphosyntactic feature bundles. From this translated corpus, a lexicon is constructed using the phonological model and the set of morphemes (stems and affixes) extracted from the 'translated' sentences. The system is then instructed to provide an orthography for the language, using an existing script such as Latin or Cyrillic. Finally, the system writes a brief grammatical handbook of the language. The system can also translate further sentences into the target language. Our goal is twofold. First, we hope that these tools will be fun to use for creating artificially constructed languages. Second, we are interested in exploring what LLMs 'know' about language-not what they know about any particular language or linguistic phenomenon, but how much they know about and understand language and linguistic concepts. As we shall see, there is a fairly wide gulf in capabilities both among different LLMs and among different linguistic specifications, with it being notably easier for systems to deal with more common patterns than rarer ones. An additional avenue that we explore is the application of our approach to translating from high-resource into low-resource languages. While the results so far are mostly negative, we provide some evidence that an improved version of the present system could afford some real gains in such tasks. https://github.com/SakanaAI/IASC
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France (0.04)
- (15 more...)
- Research Report > New Finding (1.00)
- Instructional Material (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
I Have No Mouth, and I Must Rhyme: Uncovering Internal Phonetic Representations in LLaMA 3.2
McLaughlin, Oliver, Khurana, Arjun, Merullo, Jack
Large language models demonstrate proficiency on phonetic tasks, such as rhyming, without explicit phonetic or auditory grounding. In this work, we investigate how \verb|Llama-3.2-1B-Instruct| represents token-level phonetic information. Our results suggest that Llama uses a rich internal model of phonemes to complete phonetic tasks. We provide evidence for high-level organization of phoneme representations in its latent space. In doing so, we also identify a ``phoneme mover head" which promotes phonetic information during rhyming tasks. We visualize the output space of this head and find that, while notable differences exist, Llama learns a model of vowels similar to the standard IPA vowel chart for humans, despite receiving no direct supervision to do so.
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- (2 more...)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.95)
- Questionnaire & Opinion Survey (0.69)
You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties
Tuttösí, Paige, Yeung, H. Henny, Wang, Yue, Aucouturier, Jean-Julien, Lim, Angelica
We present the first text-to-speech (TTS) system tailored to second language (L2) speakers. We use duration differences between American English tense (longer) and lax (shorter) vowels to create a "clarity mode" for Matcha-TTS. Our perception studies showed that French-L1, English-L2 listeners the participants had fewer (at least 9.15%) transcription errors when using our clarity mode, and found it more encouraging and respectful than overall slowed down speech. Remarkably, listeners were not aware of these effects: despite the decreased word error rate in clarity mode, listeners still believed that slowing all target words was the most intelligible, suggesting that actual intelligibility does not correlate with perceived intelligibility. Additionally, we found that Whisper-ASR did not use the same cues as L2 speakers to differentiate difficult vowels and is not sufficient to assess the intelligibility of TTS systems for these individuals.
- Europe > France (0.04)
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Burnaby (0.04)
- Asia > Taiwan (0.04)
LatPhon: Lightweight Multilingual G2P for Romance Languages and English
Chary, Luis Felipe, Ramirez, Miguel Arjona
Grapheme-to-phoneme (G2P) conversion is a key front-end for text-to-speech (TTS), automatic speech recognition (ASR), speech-to-speech translation (S2ST) and alignment systems, especially across multiple Latin-script languages.We present LatPhon, a 7.5 M - parameter Transformer jointly trained on six such languages--English, Spanish, French, Italian, Portuguese, and Romanian. On the public ipa-dict corpus, it attains a mean phoneme error rate (PER) of 3.5%, outperforming the byte-level ByT5 baseline (5.4%) and approaching language-specific WFSTs (3.2%) while occupying 30 MB of memory, which makes on-device deployment feasible when needed. These results indicate that compact multilingual G2P can serve as a universal front-end for Latin-language speech pipelines.
- South America > Brazil (0.05)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)