Improving semantic understanding in speech language models via brain-tuning
Moussa, Omer, Klakow, Dietrich, Toneva, Mariya
–arXiv.org Artificial Intelligence
Speech language models align with human brain responses to natural language to an impressive degree. However, current models rely heavily on low-level speech features, indicating they lack brain-relevant semantics which limits their utility as model organisms of semantic processing in the brain. In this work, we address this limitation by inducing brain-relevant bias directly into the models via fine-tuning with fMRI recordings of people listening to natural stories-a process we name brain-tuning. After testing it on 3 different pretrained model families, we show that brain-tuning not only improves overall alignment with new brain recordings in semantic language regions, but also reduces the reliance on low-level speech features for this alignment. Excitingly, we further show that brain-tuning leads to 1) consistent improvements in performance on a range of downstream tasks and 2) a representational space with increased semantic preference. Our results provide converging evidence, for the first time, that incorporating brain signals into the training of language models improves the models' semantic understanding. It is an exciting time for the cognitive neuroscience of language with the rise of language models which have been shown to align with (e.g. Researchers aim to use language models as model organisms (Toneva, 2021) of reading and listening in the brain to learn more about the underlying information processing that leads to brain-like representations of language. However, recent work has questioned whether current popular speech language models can serve this role fully, as their alignment with semantic brain regions was shown to be mostly due to lowlevel speech features, indicating that speech language models lack brain-relevant semantics (Oota et al., 2024a). Given that most large public brain recordings datasets are of speech-evoked language (LeBel et al., 2023; Nastase et al., 2021; Deniz et al., 2019; Momenian et al., 2024), having access to speech models with improved brain-relevant semantics is important and will provide better model organisms for auditory language processing. The lack of brain-relevant semantics in speech models (Oota et al., 2024a) may also be related to their incomplete semantic understanding for downstream language tasks (Choi et al., 2024). To bridge the gap between language understanding in speech models and the human brain, we propose to augment pretrained speech model training directly with brain recordings in a process we call brain-tuning (see Figure 1a for illustration of the training approach).
arXiv.org Artificial Intelligence
Oct-15-2024
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Technology: