3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
Zheng, Siqi, Cheng, Luyao, Chen, Yafeng, Wang, Hui, Chen, Qian
–arXiv.org Artificial Intelligence
Disentangling uncorrelated information in speech utterances is a crucial research topic within speech community. Different speech-related tasks focus on extracting distinct speech representations while minimizing the affects of other uncorrelated information. We present a large-scale speech corpus to facilitate the research of speech representation disentanglement. 3D-Speaker contains over 10,000 speakers, each of whom are simultaneously recorded by multiple Devices, locating at different Distances, and some speakers are speaking multiple Dialects. The controlled combinations of multi-dimensional audio data yield a matrix of a diverse blend of speech representation entanglement, thereby motivating intriguing methods to untangle them. The multi-domain nature of 3D-Speaker also makes it a suitable resource to evaluate large universal speech models and experiment methods of out-of-domain learning and self-supervised learning. https://3dspeaker.github.io/
arXiv.org Artificial Intelligence
Sep-24-2023
- Country:
- Oceania > Australia
- Queensland > Brisbane (0.04)
- North America
- United States
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California
- San Francisco County > San Francisco (0.14)
- Los Angeles County > Long Beach (0.04)
- Louisiana > Orleans Parish
- Canada > Ontario
- Toronto (0.04)
- United States
- Europe
- United Kingdom > England
- East Sussex > Brighton (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Czechia > South Moravian Region
- Brno (0.05)
- Austria > Styria
- Graz (0.04)
- United Kingdom > England
- Asia
- Oceania > Australia
- Genre:
- Research Report (0.40)
- Technology: