AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

Jiang, Xilin, Dindar, Sukru Samet, Choudhari, Vishal, Bickel, Stephan, Mehta, Ashesh, McKhann, Guy M, Friedman, Daniel, Flinker, Adeen, Mesgarani, Nima

Mar-14-2025–arXiv.org Artificial Intelligence

However, human auditory perception is inherently selective: listeners focus on specific speakers while ignoring others in complex auditory scenes. Existing models do not incorporate this selectivity, limiting their ability to generate perceptionaligned responses. To address this, we introduce Intention-Informed Auditory Scene Understanding (II-ASU) and present Auditory Attention-Driven LLM (AAD-LLM), a prototype system that integrates brain signals to infer listener attention. AAD-LLM extends an auditory LLM by incorporating intracranial electroencephalography (iEEG) recordings to decode which speaker a listener is attending to and refine responses accordingly. The model first predicts the attended speaker from neural activity, then conditions response generation on this inferred attentional state. We evaluate AAD-LLM on speaker description, speech transcription and extraction, and question answering Figure 1: AAD-LLM is a brain-computer interface in multitalker scenarios, with both objective (BCI) for auditory scene understanding. It decodes neural and subjective ratings showing improved alignment signals to identify the attended speaker and integrates with listener intention. By taking a first this information into a language model, generating responses step toward intention-aware auditory AI, this that align with the listener's perceptual focus.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Mar-14-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.93)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.93)

Industry:
- Health & Medicine
  - Health Care Technology (0.87)
  - Therapeutic Area > Neurology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)