Individualized Conditioning and Negative Distances for Speaker Separation

Sun, Tao, Abuhajar, Nidal, Gong, Shuyu, Wang, Zhewei, Smith, Charles D., Wang, Xianhui, Xu, Li, Liu, Jundong

Oct-12-2022–arXiv.org Artificial Intelligence

Speaker separation aims to extract multiple voices from a mixed signal. In this paper, we propose two speaker-aware designs to improve the existing speaker separation solutions. The first model is a speaker conditioning network that integrates speech samples to generate individualized speaker conditions, which then provide informed guidance for a separation module to produce well-separated outputs. The second design aims to reduce non-target voices in the separated speech. To this end, we propose negative distances to penalize the appearance of any non-target voice in the channel outputs, and positive distances to drive the separated voices closer to the clean targets. We explore two different setups, weighted-sum and triplet-like, to integrate these two distances to form a combined auxiliary loss for the separation networks. Experiments conducted on LibriMix demonstrate the effectiveness of our proposed models.

artificial intelligence, machine learning, separation, (15 more...)

arXiv.org Artificial Intelligence

Oct-12-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Ohio > Athens County
    - Athens (0.04)
  - Massachusetts > Suffolk County
    - Boston (0.04)
  - Kentucky > Fayette County
    - Lexington (0.14)

Genre:
- Research Report (0.50)

Industry:
- Media (0.46)
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Speech (0.70)
  - Machine Learning > Neural Networks (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found