AITopics | intonation pattern

Collaborating Authors

intonation pattern

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Linguistically Motivated Analysis of Intonational Phrasing in Text-to-Speech Systems: Revealing Gaps in Syntactic Sensitivity

Pouw, Charlotte, Alishahi, Afra, Zuidema, Willem

arXiv.org Artificial IntelligenceOct-16-2025

We analyze the syntactic sensitivity of Text-to-Speech (TTS) systems using methods inspired by psycholinguistic research. Specifically, we focus on the generation of intonational phrase boundaries, which can often be predicted by identifying syntactic boundaries within a sentence. We find that TTS systems struggle to accurately generate intonational phrase boundaries in sentences where syntactic boundaries are ambiguous (e.g., garden path sentences or sentences with attachment ambiguity). In these cases, systems need superficial cues such as commas to place boundaries at the correct positions. In contrast, for sentences with simpler syntactic structures, we find that systems do incorporate syntactic cues beyond surface markers. Finally, we finetune models on sentences without commas at the syntactic boundary positions, encouraging them to focus on more subtle linguistic cues. Our findings indicate that this leads to more distinct intonation patterns that better reflect the underlying structure.

artificial intelligence, boundary, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2025.conll-1.9

2505.22236

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis

He, Xiangheng, Chen, Junjie, Zhang, Zixing, Schuller, Björn W.

arXiv.org Artificial IntelligenceDec-19-2024

Prosody contains rich information beyond the literal meaning of words, which is crucial for the intelligibility of speech. Current models still fall short in phrasing and intonation; they not only miss or misplace breaks when synthesizing long sentences with complex structures but also produce unnatural intonation. We propose ProsodyFM, a prosody-aware text-to-speech synthesis (TTS) model with a flow-matching (FM) backbone that aims to enhance the phrasing and intonation aspects of prosody. ProsodyFM introduces two key components: a Phrase Break Encoder to capture initial phrase break locations, followed by a Duration Predictor for the flexible adjustment of break durations; and a Terminal Intonation Encoder which learns a bank of intonation shape tokens combined with a novel Pitch Processor for more robust modeling of human-perceived intonation change. ProsodyFM is trained with no explicit prosodic labels and yet can uncover a broad spectrum of break durations and intonation patterns. Experimental results demonstrate that ProsodyFM can effectively improve the phrasing and intonation aspects of prosody, thereby enhancing the overall intelligibility compared to four state-of-the-art (SOTA) models. Out-of-distribution experiments show that this prosody improvement can further bring ProsodyFM superior generalizability for unseen complex sentences and speakers. Our case study intuitively illustrates the powerful and fine-grained controllability of ProsodyFM over phrasing and intonation.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.11795

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > Massachusetts (0.04)
(7 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.91)

Add feedback

Word-wise intonation model for cross-language TTS systems

A., Tomilov A., Y., Gromova A., N, Svischev A.

arXiv.org Artificial IntelligenceSep-30-2024

In this paper we propose a word-wise intonation model for Russian language and show how it can be generalized for other languages. The proposed model is suitable for automatic data markup and its extended application to text-to-speech systems. It can also be implemented for an intonation contour modeling by using rule-based algorithms or by predicting contours with language models. The key idea is a partial elimination of the variability connected with different placements of a stressed syllable in a word. It is achieved with simultaneous applying of pitch simplification with a dynamic time warping clustering. The proposed model could be used as a tool for intonation research or as a backbone for prosody description in text-to-speech systems. As the advantage of the model, we show its relations with the existing intonation systems as well as the possibility of using language models for prosody prediction. Finally, we demonstrate some practical evidence of the system robustness to parameter variations.

contour, intonation pattern, syllable, (16 more...)

arXiv.org Artificial Intelligence

2409.20374

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.04)
Asia > Russia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.55)

Add feedback

Into-TTS : Intonation Template Based Prosody Control System

Lee, Jihwan, Lee, Joun Yeop, Choi, Heejin, Mun, Seongkyu, Park, Sangjun, Bae, Jae-Sung, Kim, Chanwoo

arXiv.org Artificial IntelligenceNov-6-2022

Intonations play an important role in delivering the intention of a speaker. However, current end-to-end TTS systems often fail to model proper intonations. To alleviate this problem, we propose a novel, intuitive method to synthesize speech in different intonations using predefined intonation templates. Prior to TTS model training, speech data are grouped into intonation templates in an unsupervised manner. Two proposed modules are added to the end-to-end TTS framework: an intonation predictor and an intonation encoder. The intonation predictor recommends a suitable intonation template to the given text. The intonation encoder, attached to the text encoder output, synthesizes speech abiding the requested intonation template. Main contributions of our paper are: (a) an easy-to-use intonation control system covering a wide range of users; (b) better performance in wrapping speech in a requested intonation with improved objective and subjective evaluation; and (c) incorporating a pre-trained language model for intonation modelling. Audio samples are available at https://srtts.github.io/IntoTTS.

intonation, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2204.01271

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Massachusetts (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

Giving your content a voice with the Newscaster speaking style from Amazon Polly

#artificialintelligenceJul-8-2020, 20:32:15 GMT

Audio content consumption has grown exponentially in the past few years. Statista reports that podcast ad revenue will exceed a billion dollars in 2021. For the publishing industry and content providers, providing audio as an alternative option to reading could improve engagement with users and be an incremental revenue stream. Given the shift in customer trends to audio consumption, Amazon Polly launched a new speaking style focusing on the publishing industry: the Newscaster speaking style. This post discusses how the Newscaster voice was built and how you can use the Newscaster voice with your content in a few simple steps.

amazon polly, artificial intelligence, machine learning, (14 more...)

#artificialintelligence

Country: North America > United States (0.05)

Industry: Media > Publishing (0.98)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.86)

Add feedback