On permutation invariant training for speech source separation

Feb-9-2021–arXiv.org Artificial Intelligence

Deep CASA, an spectrogram-based model, to Conv-TasNet, which uses very short waveform frames (such as 2 ms). We find that tPIT We study permutation invariant training (PIT), which targets at the based on such short waveform frames can be challenging. Therefore, permutation ambiguity problem for speaker independent source separation we propose performing tPIT in a pre-trained latent space--which models. We extend two state-of-the-art PIT strategies. First, allows for a more meaningful feature space for tPIT than the short we look at the two-stage speaker separation and tracking algorithm waveform frames. Further, when training the clustering model, Deep based on frame level PIT (tPIT) and clustering, which was originally CASA employs a memory and computationally expensive pairwise proposed for the STFT domain, and we adapt it to work with similarity loss that does not scale for waveform inputs. We propose waveforms and over a learned latent space. Further, we propose an a loss that reduces the complexity from quadratic to linear, making efficient clustering loss scalable to waveform models.

conv-tasnet, permutation error, separation, (12 more...)

arXiv.org Artificial Intelligence

Feb-9-2021

arXiv.org PDF

Add feedback

Country:
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)

Genre:
- Research Report (1.00)

Industry:
- Government > Regional Government (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found