NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention

De Silva, Dashanka, Cai, Siqi, Pahuja, Saurav, Schultz, Tanja, Li, Haizhou

Sep-16-2024–arXiv.org Artificial Intelligence

In the study of auditory attention, it has been revealed that there exists a robust correlation between attended speech and elicited neural responses, measurable through electroencephalography (EEG). Therefore, it is possible to use the attention information available within EEG signals to guide the extraction of the target speaker in a cocktail party computationally. In this paper, we present a neuro-guided speaker extraction model, i.e. NeuroSpex, using the EEG response of the listener as the sole auxiliary reference cue to extract attended speech from monaural speech mixtures. We propose a novel EEG signal encoder that captures the attention information. Additionally, we propose a cross-attention (CA) mechanism to enhance the speech feature representations, generating a speaker extraction mask. Experimental results on a publicly available dataset demonstrate that our proposed model outperforms two baseline models across various evaluation metrics.

adc block, speaker extraction, speech, (13 more...)

arXiv.org Artificial Intelligence

Sep-16-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.04)
- Europe > Germany
  - Bremen > Bremen (0.14)
- Asia
  - Singapore > Central Region
    - Singapore (0.04)
  - China
    - Guangdong Province > Shenzhen (0.04)
    - Hong Kong (0.04)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.49)

Technology:
- Information Technology
  - Data Science (1.00)
  - Artificial Intelligence
    - Speech (0.70)
    - Cognitive Science (0.68)
    - Machine Learning > Neural Networks
      - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found