AITopics | pitch encoder

Collaborating Authors

pitch encoder

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis

Wang, Xintong, Shi, Mingqian, Wang, Ye

arXiv.org Artificial IntelligenceJun-6-2024

Subsequently, Zhang et al. [1] adopted Mispronunciation Detection and Diagnosis (MDD) systems, an autoregressive model, the Recurrent Neural Network Transducer leveraging Automatic Speech Recognition (ASR), face two (RNN-T) [9], for MDD. This approach aims to capture main challenges in Mandarin Chinese: 1) The two-stage models the temporal dependence of mispronunciation patterns, showing create an information gap between the phoneme or tone classification better performance than Connectionist Temporal Classification stage and the MDD stage.

diagnosis, pitch encoder, pitch fusion block, (11 more...)

arXiv.org Artificial Intelligence

2406.04595

Country: Asia > Singapore > Central Region > Singapore (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Pitch Preservation In Singing Voice Synthesis

Liu, Shujun, Zhu, Hai, Wang, Kun, Wang, Huajun

arXiv.org Artificial IntelligenceOct-12-2021

Suffering from limited singing voice corpus, existing singing voice synthesis (SVS) methods that build encoder-decoder neural networks to directly generate spectrogram could lead to out-of-tune issues during the inference phase. To attenuate these issues, this paper presents a novel acoustic model with independent pitch encoder and phoneme encoder, which disentangles the phoneme and pitch information from music score to fully utilize the corpus. Specifically, according to equal temperament theory, the pitch encoder is constrained by a pitch metric loss that maps distances between adjacent input pitches into corresponding frequency multiples between the encoder outputs. For the phoneme encoder, based on the analysis that same phonemes corresponding to varying pitches can produce similar pronunciations, this encoder is followed by an adversarially trained pitch classifier to enforce the identical phonemes with different pitches mapping into the same phoneme feature space. By these means, the sparse phonemes and pitches in original input spaces can be transformed into more compact feature spaces respectively, where the same elements cluster closely and cooperate mutually to enhance synthesis quality. Then, the outputs of the two encoders are summed together to pass through the following decoder in the acoustic model. Experimental results indicate that the proposed approaches can characterize intrinsic structure between pitch inputs to obtain better pitch synthesis accuracy and achieve superior singing synthesis performance against the advanced baseline system.

classifier, encoder, phoneme encoder, (15 more...)

arXiv.org Artificial Intelligence

2110.05033

Country:

Asia > China > Sichuan Province > Chengdu (0.05)
North America > Canada > Quebec > Montreal (0.04)
Europe > Portugal > Braga > Braga (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment (0.90)
Media > Music (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback