End-to-End Multi-Microphone Speaker Extraction Using Relative Transfer Functions

Eisenberg, Aviad, Gannot, Sharon, Chazan, Shlomo E.

Feb-10-2025–arXiv.org Artificial Intelligence

This paper introduces a multi-microphone method for extracting a desired speaker from a mixture involving multiple speakers and directional noise in a reverberant environment. In this work, we propose leveraging the instantaneous relative transfer function (RTF), estimated from a reference utterance recorded in the same position as the desired source. The effectiveness of the RTF-based spatial cue is compared with direction of arrival (DOA)-based spatial cue and the conventional spectral embedding. Experimental results in challenging acoustic scenarios demonstrate that using spatial cues yields better performance than the spectral-based cue and that the instantaneous RTF outperforms the DOA-based spatial cue.

artificial intelligence, machine learning, speech recognition, (17 more...)

arXiv.org Artificial Intelligence

Feb-10-2025

arXiv.org PDF

Add feedback

Country:
- Europe
  - Netherlands > North Brabant
    - Eindhoven (0.04)
  - Italy > Lazio
    - Rome (0.04)
- Asia > Middle East
  - Israel (0.05)

Genre:
- Research Report > New Finding (0.47)

Technology:
- Information Technology
  - Data Science (0.94)
  - Artificial Intelligence
    - Speech > Speech Recognition (0.97)
    - Machine Learning > Neural Networks (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found