Stepback: Enhanced Disentanglement for Voice Conversion via Multi-Task Learning

Yang, Qian, Graham, Calbert

arXiv.org Artificial Intelligence 

VAEs consist of two main parts: a content Voice conversion (VC) modifies voice characteristics while encoder and a decoder. The content encoder processes source preserving linguistic content. This paper presents the Stepback speech, transforms it into a latent representation, and removes network, a novel model for converting speaker identity using speaker information. The decoder takes the speaker identity, non-parallel data. Unlike traditional VC methods that rely on combines it with the latent representation, and reconstructs the parallel data, our approach leverages deep learning techniques speech[5]. A notable VAE approach is disentangling speaker to enhance disentanglement completion and linguistic content and content representations using instance normalization, which preservation.