BiSinger: Bilingual Singing Voice Synthesis

Zhou, Huali, Lin, Yueqian, Shi, Yao, Sun, Peng, Li, Ming

Jan-9-2024–arXiv.org Artificial Intelligence

Although Singing Voice Synthesis (SVS) has made great strides with Text-to-Speech (TTS) techniques, multilingual singing voice modeling remains relatively unexplored. This paper presents BiSinger, a bilingual pop SVS system for English and Chinese Mandarin. Current systems require separate models per language and cannot accurately represent both Chinese and English, hindering code-switch SVS. To address this gap, we design a shared representation between Chinese and English singing voices, achieved by using the CMU dictionary with mapping rules. We fuse monolingual singing datasets with open-source singing voice conversion techniques to generate bilingual singing voices while also exploring the potential use of bilingual speech data. Experiments affirm that our language-independent representation and incorporation of related datasets enable a single model with enhanced performance in English and code-switch SVS while maintaining Chinese song performance. Audio samples are available at https://bisinger-svs.github.io.

artificial intelligence, machine learning, synthesis, (18 more...)

arXiv.org Artificial Intelligence

Jan-9-2024

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.14)
- North America > United States (0.14)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Leisure & Entertainment (0.70)
- Media > Music (0.70)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (1.00)
  - Speech (1.00)