Single-Channel Multi-Speaker Separation using Deep Clustering

Isik, Yusuf, Roux, Jonathan Le, Chen, Zhuo, Watanabe, Shinji, Hershey, John R.

Jul-7-2016–arXiv.org Machine Learning

Deep clustering is a recently introduced deep learning architecture that uses discriminatively trained embeddings as the basis for clustering. It was recently applied to spectrogram segmentation, resulting in impressive results on speaker-independent multi-speaker separation. In this paper we extend the baseline system with an end-to-end signal approximation objective that greatly improves performance on a challenging speech separation. We first significantly improve upon the baseline system performance by incorporating better regularization, larger temporal context, and a deeper architecture, culminating in an overall improvement in signal to distortion ratio (SDR) of 10.3 dB compared to the baseline of 6.0 dB for two-speaker separation, as well as a 7.1 dB SDR improvement for three-speaker separation. We then extend the model to incorporate an enhancement layer to refine the signal estimates, and perform end-to-end training through both the clustering and enhancement stages to maximize signal fidelity. We evaluate the results using automatic speech recognition. The new signal approximation objective, combined with end-to-end training, produces unprecedented performance, reducing the word error rate (WER) from 89.1% down to 30.8%. This represents a major advancement towards solving the cocktail party problem.

deep learning, separation, speech recognition, (19 more...)

arXiv.org Machine Learning

Jul-7-2016

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Republic of Türkiye (0.28)
- Europe (0.28)
- North America > United States (0.46)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.47)
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (1.00)
  - Speech > Speech Recognition (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found