AAD-LLM: Neural Attention-Driven Auditory Scene Understanding
Jiang, Xilin, Dindar, Sukru Samet, Choudhari, Vishal, Bickel, Stephan, Mehta, Ashesh, McKhann, Guy M, Friedman, Daniel, Flinker, Adeen, Mesgarani, Nima
–arXiv.org Artificial Intelligence
However, human auditory perception is inherently selective: listeners focus on specific speakers while ignoring others in complex auditory scenes. Existing models do not incorporate this selectivity, limiting their ability to generate perceptionaligned responses. To address this, we introduce Intention-Informed Auditory Scene Understanding (II-ASU) and present Auditory Attention-Driven LLM (AAD-LLM), a prototype system that integrates brain signals to infer listener attention. AAD-LLM extends an auditory LLM by incorporating intracranial electroencephalography (iEEG) recordings to decode which speaker a listener is attending to and refine responses accordingly. The model first predicts the attended speaker from neural activity, then conditions response generation on this inferred attentional state. We evaluate AAD-LLM on speaker description, speech transcription and extraction, and question answering Figure 1: AAD-LLM is a brain-computer interface in multitalker scenarios, with both objective (BCI) for auditory scene understanding. It decodes neural and subjective ratings showing improved alignment signals to identify the attended speaker and integrates with listener intention. By taking a first this information into a language model, generating responses step toward intention-aware auditory AI, this that align with the listener's perceptual focus.
arXiv.org Artificial Intelligence
Mar-14-2025
- Country:
- North America > United States (0.93)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.93)
- Research Report
- Industry:
- Health & Medicine
- Health Care Technology (0.87)
- Therapeutic Area > Neurology (1.00)
- Health & Medicine
- Technology: