Goto

Collaborating Authors

 formulation




P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting Sungwon Kim 1,2, Kevin J Shih

Neural Information Processing Systems

Our work proposes P-Flow, a fast and data-efficient zero-shot TTS model that uses speech prompts for speaker adaptation. P-Flow comprises a speech-prompted text encoder for speaker adaptation and a flow matching generative decoder for high-quality and fast speech synthesis.



DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets Supplementary Material

Neural Information Processing Systems

Chen et al. [2022] proves that in such a binary-classification problem, an MoE layer converges to an Proof: We will prove this by contradiction. Thus, we show that vanilla MoE does not guarantee convergence with mixture of datasets.