Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention

Koizumi, Yuma, Yatabe, Kohei, Delcroix, Marc, Masuyama, Yoshiki, Takeuchi, Daiki

Feb-14-2020–arXiv.org Machine Learning

This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features; we extract a speaker representation used for adaptation directly from the test utterance. Conventional studies of deep neural network (DNN)--based speech enhancement mainly focus on building a speaker independent model. Meanwhile, in speech applications including speech recognition and synthesis, it is known that model adaptation to the target speaker improves the accuracy. Our research question is whether a DNN for speech enhancement can be adopted to unknown speakers without any auxiliary guidance signal in test-phase. To achieve this, we adopt multi-task learning of speech enhancement and speaker identification, and use the output of the final hidden layer of speaker identification branch as an auxiliary feature. In addition, we use multi-head self-attention for capturing long-term dependencies in the speech and noise. Experimental results on a public dataset show that our strategy achieves the state-of-the-art performance and also outperform conventional methods in terms of subjective quality.

latexit latexit sha1, latexit sha1, speech enhancement, (12 more...)

arXiv.org Machine Learning

Feb-14-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > Santa Clara County > Palo Alto (0.04)
- Asia > Japan
  - Honshū
    - Kantō > Tokyo Metropolis Prefecture
      - Tokyo (0.14)
    - Kansai > Kyoto Prefecture
      - Kyoto (0.04)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Speech (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found