3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement

Zheng, Siqi, Cheng, Luyao, Chen, Yafeng, Wang, Hui, Chen, Qian

Sep-24-2023–arXiv.org Artificial Intelligence

Disentangling uncorrelated information in speech utterances is a crucial research topic within speech community. Different speech-related tasks focus on extracting distinct speech representations while minimizing the affects of other uncorrelated information. We present a large-scale speech corpus to facilitate the research of speech representation disentanglement. 3D-Speaker contains over 10,000 speakers, each of whom are simultaneously recorded by multiple Devices, locating at different Distances, and some speakers are speaking multiple Dialects. The controlled combinations of multi-dimensional audio data yield a matrix of a diverse blend of speech representation entanglement, thereby motivating intriguing methods to untangle them. The multi-domain nature of 3D-Speaker also makes it a suitable resource to evaluate large universal speech models and experiment methods of out-of-domain learning and self-supervised learning. https://3dspeaker.github.io/

annual conference, information, international speech communication association, (11 more...)

arXiv.org Artificial Intelligence

Sep-24-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Queensland > Brisbane (0.04)
- North America
  - United States
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - California
      - San Francisco County > San Francisco (0.14)
      - Los Angeles County > Long Beach (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - United Kingdom > England
    - East Sussex > Brighton (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Czechia > South Moravian Region
    - Brno (0.05)
  - Austria > Styria
    - Graz (0.04)
- Asia
  - Singapore (0.04)
  - India > Telangana
    - Hyderabad (0.04)
  - China
    - Shanghai > Shanghai (0.05)
    - Beijing > Beijing (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (0.74)
  - Machine Learning > Inductive Learning (0.50)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found