Supplementary Material of Glow-TTS: A Generative Flow for T ext-to-Speech via Monotonic Alignment Search Appendix A

Neural Information Processing Systems 

The detailed encoder architecture is depicted in Figure 7. We design the grouped 1x1 convolutions to be able to mix channels. Figure 8c shows an example. The decoder gets a mel-spectrogram and squeezes it. The, the decoder processes it through a number of flow blocks.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found