AITopics | diphthong

Collaborating Authors

diphthong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sperm whales use vowels like humans, new study finds

Scientists decoding whale clicks found patterns that echo the building blocks of human speech. The marine mammals have a complex communication system that scientists are working to decode. Breakthroughs, discoveries, and DIY tips sent every weekday. A new study discovered a fresh component of their various vocalizations and could hint at potential language structures. Sperm whales exhibit patterns similar to human vowels and diphthongs-a connected pair of vowels in a word, such as the "oi" in .

artificial intelligence, sperm whale, vowel, (12 more...)

Popular Science

Country: North America > United States > California (0.16)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence (0.50)

Add feedback

Towards a dynamical model of English vowels. Evidence from diphthongisation

Strycharczuk, Patrycja, Kirkham, Sam, Gorman, Emily, Nagamine, Takayuki

arXiv.org Artificial IntelligenceAug-30-2024

Diphthong vowels exhibit a degree of inherent dynamic change, the extent of which can vary synchronically and diachronically, such that diphthong vowels can become monophthongs and vice versa. Modelling this type of change requires defining diphthongs in opposition to monophthongs. However, formulating an explicit definition has proven elusive in acoustics and articulation, as diphthongisation is often gradient in these domains. In this study, we consider whether diphthong vowels form a coherent phonetic category from the articulatory point of view. We present articulometry and acoustic data from six speakers of Northern Anglo-English producing a full set of phonologically long vowels. We analyse several measures of diphthongisation, all of which suggest that diphthongs are not categorically distinct from long monophthongs. We account for this observation with an Articulatory Phonology/Task Dynamic model in which diphthongs and long monophthongs have a common gestural representation, comprising two articulatory targets in each case, but they differ according to gestural constriction and location of the component gestures. We argue that a two-target representation for all long vowels is independently supported by phonological weight, as well as by the nature of historical diphthongisation and present-day dynamic vowel variation in British English.

diphthong, diphthongisation, vowel, (17 more...)

arXiv.org Artificial Intelligence

2409.00275

Country:

North America > United States > California (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(6 more...)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Remastering Divide and Remaster: A Cinematic Audio Source Separation Dataset with Multilingual Support

Watcharasupat, Karn N., Wu, Chih-Wei, Orife, Iroro

arXiv.org Artificial IntelligenceJul-9-2024

Cinematic audio source separation (CASS) is a relatively new subtask of audio source separation, concerned with the separation of a mixture into the dialogue, music, and effects stems. To date, only one publicly available dataset exists for CASS, that is, the Divide and Remaster (DnR) dataset, which is currently at version 2. While DnR v2 has been an incredibly useful resource for CASS, several areas of improvement have been identified, particularly through its use in the 2023 Sound Demixing Challenge. In this work, we develop version 3 of the DnR dataset, addressing issues relating to vocal content in non-dialogue stems, loudness distributions, mastering process, and linguistic diversity. In particular, the dialogue stem of DnR v3 includes speech content from more than 30 languages from multiple families including but not limited to the Germanic, Romance, Indo-Aryan, Dravidian, Malayo-Polynesian, and Bantu families. Benchmark results using the Bandit model indicated that training on multilingual data yields significant generalizability to the model even in languages with low data availability. Even in languages with high data availability, the multilingual model often performs on par or better than dedicated models trained on monolingual CASS datasets.

dataset, total speaker, vowel, (15 more...)

arXiv.org Artificial Intelligence

2407.07275

Country:

Asia > Malaysia (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
Africa > South Africa (0.04)
(34 more...)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

IPA Transcription of Bengali Texts

Fatema, Kanij, Haider, Fazle Dawood, Turpa, Nirzona Ferdousi, Azmal, Tanveer, Ahmed, Sourav, Hasan, Navid, Rahman, Mohammad Akhlaqur, Sarkar, Biplab Kumar, Jahin, Afrar, Hassan, Md. Rezuwan, Zihad, Md Foriduzzaman, Faruque, Rubayet Sabbir, Sushmit, Asif, Imtiaz, Mashrur, Sadeque, Farig, Rahman, Syed Shahrier

arXiv.org Artificial IntelligenceMar-29-2024

The International Phonetic Alphabet (IPA) serves to systematize phonemes in language, enabling precise textual representation of pronunciation. In Bengali phonology and phonetics, ongoing scholarly deliberations persist concerning the IPA standard and core Bengali phonemes. This work examines prior research, identifies current and potential issues, and suggests a framework for a Bengali IPA standard, facilitating linguistic analysis and NLP resource creation and downstream technology development. In this work, we present a comprehensive study of Bengali IPA transcription and introduce a novel IPA transcription framework incorporating a novel dataset with DL-based benchmarks.

bangla, bangla language, diphthong, (15 more...)

arXiv.org Artificial Intelligence

2403.20084

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > India (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.69)
Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Refining a Deep Learning-based Formant Tracker using Linear Prediction Methods

Alku, Paavo, Kadiri, Sudarsana Reddy, Gowda, Dhananjaya

arXiv.org Artificial IntelligenceAug-17-2023

In this study, formant tracking is investigated by refining the formants tracked by an existing data-driven tracker, DeepFormants, using the formants estimated in a model-driven manner by linear prediction (LP)-based methods. As LP-based formant estimation methods, conventional covariance analysis (LP-COV) and the recently proposed quasi-closed phase forward-backward (QCP-FB) analysis are used. In the proposed refinement approach, the contours of the three lowest formants are first predicted by the data-driven DeepFormants tracker, and the predicted formants are replaced frame-wise with local spectral peaks shown by the model-driven LP-based methods. The refinement procedure can be plugged into the DeepFormants tracker with no need for any new data learning. Two refined DeepFormants trackers were compared with the original DeepFormants and with five known traditional trackers using the popular vocal tract resonance (VTR) corpus. The results indicated that the data-driven DeepFormants trackers outperformed the conventional trackers and that the best performance was obtained by refining the formants predicted by DeepFormants using QCP-FB analysis. In addition, by tracking formants using VTR speech that was corrupted by additive noise, the study showed that the refined DeepFormants trackers were more resilient to noise than the reference trackers. In general, these results suggest that LP-based model-driven approaches, which have traditionally been used in formant estimation, can be combined with a modern data-driven tracker easily with no further training to improve the tracker's performance.

artificial intelligence, machine learning, tracker, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.csl.2023.101515

2308.09051

Country:

Europe > Finland (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > Canada > Quebec > Montreal (0.04)
(8 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking

Rakib, Fazle Rabbi, Dip, Souhardya Saha, Alam, Samiul, Tasnim, Nazia, Shihab, Md. Istiak Hossain, Ansary, Md. Nazmuddoha, Hossen, Syed Mobassir, Meghla, Marsia Haque, Mamun, Mamunur, Sadeque, Farig, Chowdhury, Sayma Sultana, Reasat, Tahsin, Sushmit, Asif, Humayun, Ahmed Imtiaz

arXiv.org Artificial IntelligenceMay-15-2023

Being one of the most spoken languages globally, Bengali portrays large diversity in dialects and prosodic features, which demands ASR frameworks to be robust towards distribution shifts. For example, islamic religious sermons in Bengali are delivered with a tonality that is significantly different from regular speech. Our training dataset is collected via massively online crowdsourcing campaigns which resulted in 1177.94 hours collected and curated from 22, 645 native Bengali speakers from South Asia. Our test dataset comprises 23.03 hours of speech collected and manually annotated from 17 different sources, e.g., Bengali TV drama, Audiobook, Talk show, Online class, and Islamic sermons to name a few. OOD-Speech is jointly the largest publicly available speech dataset, as well as the first out-ofdistribution Figure 1: t-Stochastic Neighbor Embeddings [6] of Geneva ASR benchmarking dataset for Bengali.

artificial intelligence, machine learning, social media, (19 more...)

arXiv.org Artificial Intelligence

2305.09688

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
Asia > India > West Bengal > Kolkata (0.04)
Asia > India > Tripura (0.04)
Asia > Bangladesh > Rangpur Division > Rangpur District > Rangpur (0.04)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment (1.00)
Education > Educational Setting > Online (1.00)
Media > Music (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback