Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement

Open in new window