DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion
Yuan, Ruibin, Wu, Yuxuan, Li, Jacob, Kim, Jaxter
–arXiv.org Artificial Intelligence
The widespread adoption of speech-based online services raises security and privacy concerns regarding the data that they use and share. If the data were compromised, attackers could exploit user speech to bypass speaker verification systems or even impersonate users. To mitigate this, we propose DeID-VC, a speaker de-identification system that converts a real speaker to pseudo speakers, thus removing or obfuscating the speaker-dependent attributes from a spoken voice. The key components of DeID-VC include a Variational Autoencoder (VAE) based Pseudo Speaker Generator (PSG) and a voice conversion Autoencoder (AE) under zero-shot settings. With the help of PSG, DeID-VC can assign unique pseudo speakers at speaker level or even at utterance level. Also, two novel learning objectives are added to bridge the gap between training and inference of zero-shot voice conversion. We present our experimental results with word error rate (WER) and equal error rate (EER), along with three subjective metrics to evaluate the generated output of DeID-VC. The result shows that our method substantially improved intelligibility (WER 10% lower) and de-identification effectiveness (EER 5% higher) compared to our baseline. Code and listening demo: https://github.com/a43992899/DeID-VC
arXiv.org Artificial Intelligence
Sep-9-2022
- Country:
- Oceania > Australia
- Queensland > Brisbane (0.04)
- North America
- United States
- Utah > Salt Lake County
- Salt Lake City (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Florida > Hillsborough County
- University (0.04)
- California > Los Angeles County
- Long Beach (0.04)
- Utah > Salt Lake County
- Canada > Alberta
- United States
- Europe > Spain
- Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > China
- Oceania > Australia
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Speech (1.00)
- Machine Learning > Neural Networks (1.00)
- Natural Language > Large Language Model (0.91)
- Information Technology > Artificial Intelligence