AITopics | fastpitch

Collaborating Authors

fastpitch

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Incremental FastPitch: Chunk-based High Quality Text to Speech

Du, Muyang, Liu, Chuan, Lai, Junjie

arXiv.org Artificial IntelligenceJan-3-2024

ABSTRACT Parallel text-to-speech models have been widely applied for real-time speech synthesis, and they offer more controllability and a much faster synthesis process compared with conventional auto-regressive models. Although parallel models have benefits in many aspects, they become naturally unfit for incremental synthesis due to their fully parallel architecture such as transformer. In this work, we propose Incremental FastPitch, a novel FastPitch variant capable of incrementally producing high-quality Mel chunks by improving the architecture with chunk-based FFT blocks, training with receptivefield constrained chunk attention masks, and inference with fixed size past model states. Experimental results show that our proposal can produce speech quality comparable to the parallel FastPitch, with a significant lower latency that allows even lower response time for real-time speech applications. Index Terms-- text-to-speech, speech synthesis, realtime, low-latency, streaming tts 1. INTRODUCTION In recent years, Text-to-Speech (TTS) technology has witnessed Figure 1: Incremental FastPitch, Chunk-based FFT Block, and remarkable advancements, enabling the generation of Chunk Mask for Receptive-Filed Constrained Training natural and expressive speech from text inputs.

fastpitch, incremental fastpitch, inference, (16 more...)

arXiv.org Artificial Intelligence

2401.01755

Country:

North America > United States (0.14)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)

Add feedback

Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers

Hsieh, Cheng-Ping, Ghosh, Subhankar, Ginsburg, Boris

arXiv.org Artificial IntelligenceNov-1-2022

Fine-tuning is a popular method for adapting text-to-speech (TTS) models to new speakers. However this approach has some challenges. Usually fine-tuning requires several hours of high quality speech per speaker. There is also that fine-tuning will negatively affect the quality of speech synthesis for previously learnt speakers. In this paper we propose an alternative approach for TTS adaptation based on using parameter-efficient adapter modules. In the proposed approach, a few small adapter modules are added to the original network. The original weights are frozen, and only the adapters are fine-tuned on speech for new speaker. The parameter-efficient fine-tuning approach will produce a new model with high level of parameter sharing with original model. Our experiments on LibriTTS, HiFi-TTS and VCTK datasets validate the effectiveness of adapter-based method through objective and subjective metrics.

artificial intelligence, machine learning, module, (17 more...)

arXiv.org Artificial Intelligence

2211.00585

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.93)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data

Zhang, Xulong, Wang, Jianzong, Cheng, Ning, Xiao, Jing

arXiv.org Artificial IntelligenceOct-25-2022

In this paper, we proposed Adapitch, a multi-speaker TTS method that makes adaptation of the supervised module with untranscribed data. We design two self supervised modules to train the text encoder and mel decoder separately with untranscribed data to enhance the representation of text and mel. To better handle the prosody information in a synthesized voice, a supervised TTS module is designed conditioned on content disentangling of pitch, text, and speaker. The training phase was separated into two parts, pretrained and fixed the text encoder and mel decoder with unsupervised mode, then the supervised mode on the disentanglement of TTS. Experiment results show that the Adaptich achieved much better quality than baseline methods.

artificial intelligence, machine learning, module, (17 more...)

arXiv.org Artificial Intelligence

2210.13803

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback