Goto

Collaborating Authors

 Flinker, Adeen


AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

arXiv.org Artificial Intelligence

However, human auditory perception is inherently selective: listeners focus on specific speakers while ignoring others in complex auditory scenes. Existing models do not incorporate this selectivity, limiting their ability to generate perceptionaligned responses. To address this, we introduce Intention-Informed Auditory Scene Understanding (II-ASU) and present Auditory Attention-Driven LLM (AAD-LLM), a prototype system that integrates brain signals to infer listener attention. AAD-LLM extends an auditory LLM by incorporating intracranial electroencephalography (iEEG) recordings to decode which speaker a listener is attending to and refine responses accordingly. The model first predicts the attended speaker from neural activity, then conditions response generation on this inferred attentional state. We evaluate AAD-LLM on speaker description, speech transcription and extraction, and question answering Figure 1: AAD-LLM is a brain-computer interface in multitalker scenarios, with both objective (BCI) for auditory scene understanding. It decodes neural and subjective ratings showing improved alignment signals to identify the attended speaker and integrates with listener intention. By taking a first this information into a language model, generating responses step toward intention-aware auditory AI, this that align with the listener's perceptual focus.


The Temporal Structure of Language Processing in the Human Brain Corresponds to The Layered Hierarchy of Deep Language Models

arXiv.org Artificial Intelligence

Deep Language Models (DLMs) provide a novel computational paradigm for understanding the mechanisms of natural language processing in the human brain. Unlike traditional psycholinguistic models, DLMs use layered sequences of continuous numerical vectors to represent words and context, allowing a plethora of emerging applications such as human-like text generation. In this paper we show evidence that the layered hierarchy of DLMs may be used to model the temporal dynamics of language comprehension in the brain by demonstrating a strong correlation between DLM layer depth and the time at which layers are most predictive of the human brain. Our ability to temporally resolve individual layers benefits from our use of electrocorticography (ECoG) data, which has a much higher temporal resolution than noninvasive methods like fMRI. Using ECoG, we record neural activity from participants listening to a 30-minute narrative while also feeding the same narrative to a high-performing DLM (GPT2-XL). We then extract contextual embeddings from the different layers of the DLM and use linear encoding models to predict neural activity. We first focus on the Inferior Frontal Gyrus (IFG, or Broca's area) and then extend our model to track the increasing temporal receptive window along the linguistic processing hierarchy from auditory to syntactic and semantic areas. Our results reveal a connection between human language processing and DLMs, with the DLM's layer-by-layer accumulation of contextual information mirroring the timing of neural activity in high-order language areas.


Reconstructing Speech Stimuli From Human Auditory Cortex Activity Using a WaveNet Approach

arXiv.org Machine Learning

Abstract--The superior temporal gyrus (STG) region of cortex critically contributes to speech recognition. In this work, we show that a proposed deep network inspired by WaveNet, trained with limited available data, is able to reconstruct speech stimuli from STG intracranial recordings. We further investigate the impulse response of the fitted model for each recording electrode and observe phoneme level temporospectral tuning properties in some recorded area. This discovery is consistent with previous studies implicating the posterior STG (pSTG) in a phonetic representation of speech and provides detailed acoustic features that certain electrode sites possibly extract during speech recognition. Research studies on the superior temporal gyrus (STG) cortex area have shown that this area plays an important role in words and sentence recognition on a phonetic and prelexical stage [1]-[9].