End-to-End Multi-Microphone Speaker Extraction Using Relative Transfer Functions
Eisenberg, Aviad, Gannot, Sharon, Chazan, Shlomo E.
–arXiv.org Artificial Intelligence
This paper introduces a multi-microphone method for extracting a desired speaker from a mixture involving multiple speakers and directional noise in a reverberant environment. In this work, we propose leveraging the instantaneous relative transfer function (RTF), estimated from a reference utterance recorded in the same position as the desired source. The effectiveness of the RTF-based spatial cue is compared with direction of arrival (DOA)-based spatial cue and the conventional spectral embedding. Experimental results in challenging acoustic scenarios demonstrate that using spatial cues yields better performance than the spectral-based cue and that the instantaneous RTF outperforms the DOA-based spatial cue.
arXiv.org Artificial Intelligence
Feb-10-2025
- Country:
- Asia > Middle East
- Israel (0.05)
- Europe
- Italy > Lazio
- Rome (0.04)
- Netherlands > North Brabant
- Eindhoven (0.04)
- Italy > Lazio
- Asia > Middle East
- Genre:
- Research Report > New Finding (0.47)
- Technology: