AITopics | gan-tts

Collaborating Authors

gan-tts

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

c5d736809766d46260d816d8dbc9eb44-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-10-2026, 06:24:35 GMT

fine-tuning, ground truth, melgan, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.90)

Add feedback

We thank all the reviewers for their valuable comments

Neural Information Processing SystemsAug-16-2025, 08:25:53 GMT

We thank all the reviewers for their valuable comments. We would like to clarify that, 'When the model was trained without the mel-spectrogram loss, the training process We also think that applying the L1/L2 loss gives no disadvantage in one-to-one mapping as our work. We will clarify the details of the experiments in Section 3. Table 1: Mean Opinion Scores. All models were trained up to 500k steps. MOS evaluation results are shown in [Table 1].

melgan, reviewer, valuable comment, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.90)

Add feedback

9873eaad153c6c960616c89e54fe155a-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 07:04:09 GMT

architecture, convolution, implementation, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

spectrogram-based losses without such a term, and since adding the term greatly boosts performance, we also expect 3 this contribution to have the greatest impact on future research

Neural Information Processing SystemsAug-15-2025, 07:03:48 GMT

We thank all reviewers for their input on our paper. WaveFlow is a hybrid between autoregressive and parallel flow-based models. Add definition of p (x), q( y) after equation 1. (R3) Done.

contribution, repulsive term, spectrogram-based loss, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

A Spectral Energy Distance for Parallel Speech Synthesis

Gritsenko, Alexey A., Salimans, Tim, Berg, Rianne van den, Snoek, Jasper, Kalchbrenner, Nal

arXiv.org Machine LearningOct-23-2020

Speech synthesis is an important practical generative modeling problem that has seen great progress over the last few years, with likelihood-based autoregressive neural models now outperforming traditional concatenative systems. A downside of such autoregressive models is that they require executing tens of thousands of sequential operations per second of generated audio, making them ill-suited for deployment on specialized deep learning hardware. Here, we propose a new learning method that allows us to train highly parallel models of speech, without requiring access to an analytical likelihood function. Our approach is based on a generalized energy distance between the distributions of the generated and real audio. This spectral energy distance is a proper scoring rule with respect to the distribution over magnitude-spectrograms of the generated waveform audio and offers statistical consistency guarantees. The distance can be calculated from minibatches without bias, and does not involve adversarial learning, yielding a stable and consistent method for training implicit generative models. Empirically, we achieve state-of-the-art generation quality among implicit generative models, as judged by the recently-proposed cFDSD metric. When combining our method with adversarial techniques, we also improve upon the recently-proposed GAN-TTS model in terms of Mean Opinion Score as judged by trained human evaluators.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

2008.0116

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.82)

Industry:

Education (0.46)
Leisure & Entertainment (0.35)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DeepMind Generates High Fidelity Speech With GAN-TTS

#artificialintelligenceOct-15-2019, 17:54:34 GMT

GANs have achieved state-of-the-art results in image and video generation, and have been successfully applied for unsupervised feature learning among many other applications. Generative adversarial networks have seen rapid development in recent years, however, their audio generation prowess has largely gone unnoticed. In an attempt to explore the audio generation abilities of GANs, a team of DeepMind researchers published a work where they introduce a new model called GAN-TTS. Text-to-Speech (TTS) is a process for converting text into a humanlike voice output. Many audio generation models operate in the waveform domain.

autoregressive model, deepmind generate high fidelity speech, gan-tts, (5 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Google's highly scalable AI can generate convincingly humanlike speech

#artificialintelligenceOct-1-2019, 10:04:03 GMT

A generative adversarial network (GAN) is a versatile AI architecture type that's exceptionally well-suited to synthesizing images, videos, and text from limited data. But it's not much been applied to the audio production domain owing to a number of design challenges, which is why Google and Imperial College London researchers set out to create a GAN-based text-to-speech system capable of matching (or besting) state-of-the-art methods. They say that their model not only generates high-fidelity speech with "naturalness" but that it's highly parallelizable, meaning it's more easily trained across multiple machines compared with conventional alternatives. "A notable limitation of [state-of-the-art TTS] models is that they are difficult to parallelize over time: they predict each time step of an audio signal in sequence, which is computationally expensive and often impractical," wrote the coauthors. "A lot of recent research on neural models for TTS has focused on improving parallelism by predicting multiple time steps in parallel. An alternative approach for parallel waveform generation would be to use generative adversarial networks … To the best of our knowledge, GANs have not yet been applied at large scale to non-visual domains."

gan-tts, generate convincingly humanlike speech, speech, (7 more...)

#artificialintelligence

Genre: Research Report > New Finding (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback