AITopics | glow-tts

Collaborating Authors

glow-tts

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Glow-TTS: AGenerativeFlowforText-to-Speechvia MonotonicAlignmentSearch

Neural Information Processing SystemsFeb-19-2026, 02:05:39 GMT

In this work, we propose Glow-TTS, a flow-based generativemodel for parallel TTS that does not require anyexternal aligner.

artificial intelligence, glow-tts, machine learning, (19 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech (0.97)

Add feedback

Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Neural Information Processing SystemsDec-24-2025, 02:07:40 GMT

Recently, text-to-speech (TTS) models such as FastSpeech and ParaNet have been proposed to generate mel-spectrograms from text in parallel. Despite the advantage, the parallel TTS models cannot be trained without guidance from autoregressive TTS models as their external aligners. In this work, we propose Glow-TTS, a flow-based generative model for parallel TTS that does not require any external aligner. By combining the properties of flows and dynamic programming, the proposed model searches for the most probable monotonic alignment between text and the latent representation of speech on its own. We demonstrate that enforcing hard monotonic alignments enables robust TTS, which generalizes to long utterances, and employing generative flows enables fast, diverse, and controllable speech synthesis.

generative flow, glow-tts, text-to-speech, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.64)

Add feedback

Supplementary Material of Glow-TTS: A Generative Flow for T ext-to-Speech via Monotonic Alignment Search Appendix A

Neural Information Processing SystemsOct-3-2025, 00:28:10 GMT

The detailed encoder architecture is depicted in Figure 7. We design the grouped 1x1 convolutions to be able to mix channels. Figure 8c shows an example. The decoder gets a mel-spectrogram and squeezes it. The, the decoder processes it through a number of flow blocks.

artificial intelligence, glow-tts, tacotron 2, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.30)

Add feedback

Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Neural Information Processing SystemsOct-3-2025, 00:28:02 GMT

In this work, we propose Glow-TTS, a flow-based generative model for parallel TTS that does not require any external aligner.

artificial intelligence, glow-tts, machine learning, (16 more...)

Neural Information Processing Systems

Industry:

Information Technology (0.68)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.69)

Add feedback

Thanks all the reviewers for the detailed and thoughtful comments

Neural Information Processing SystemsOct-3-2025, 00:27:52 GMT

Thanks all the reviewers for the detailed and thoughtful comments. HMM-based works [1, 2, 3], all of which proposed methods to estimate alignments from unsegmented data. We've not thoroughly explored to improve the duration predictor and simply follow the same We design the grouped 1x1 convolutions to be able to mix channels. For example, to generate a speech of 5.8 Therefore, adopting parallel TTS models significantly improves the sampling speed of end-to-end systems. In Section 5.3, we showed that varying temperature can change We will add a reference about Viterbi training.

artificial intelligence, machine learning, tacotron 2, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)

Add feedback

Review for NeurIPS paper: Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Neural Information Processing SystemsJan-24-2025, 18:18:02 GMT

Weaknesses: I was a little confused about how the grouped 1x1 convolutions interact with the coupling layers. If the standard (half-and-half) partitioning is used for the coupling layers and the grouped 1x1 convolutions never mix channels outside of their group of 4, then half of the channels will never be transformed by any coupling layer. I'm assuming the authors deal with this issue somehow (since the results are good), but I only briefly scanned the code and didn't want to work through all of the index gymnastics. I could see readers being confused by these missing details. Update: In their response, the authors said they will explain more of the details of the grouped 1x1 convolutions in their revised version.

generative flow, monotonic alignment search, vocoder, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.40)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.40)
Information Technology > Artificial Intelligence > Assistive Technologies (0.40)

Add feedback

Review for NeurIPS paper: Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Neural Information Processing SystemsJan-24-2025, 18:17:55 GMT

After rebuttal and discussion, all four reviewers provide very favorable reviews. The reviewers point out a novel methodology, combining flows with dynamic programming (hard monotonic alignment). The paper is therefore accepted for an oral.

generative flow, monotonic alignment search, text-to-speech, (2 more...)

Neural Information Processing Systems

Technology: