LibriBrain: Over 50 Hours of Within-Subject MEG to Improve Speech Decoding Methods at Scale

Özdogan, Miran, Landau, Gilad, Elvers, Gereon, Jayalath, Dulhan, Somaiya, Pratik, Mantegna, Francesco, Woolrich, Mark, Jones, Oiwi Parker

Jun-4-2025–arXiv.org Artificial Intelligence

LibriBrain represents the largest single-subject MEG dataset to date for speech decoding, with over 50 hours of recordings -- 5$\times$ larger than the next comparable dataset and 50$\times$ larger than most. This unprecedented `depth' of within-subject data enables exploration of neural representations at a scale previously unavailable with non-invasive methods. LibriBrain comprises high-quality MEG recordings together with detailed annotations from a single participant listening to naturalistic spoken English, covering nearly the full Sherlock Holmes canon. Designed to support advances in neural decoding, LibriBrain comes with a Python library for streamlined integration with deep learning frameworks, standard data splits for reproducibility, and baseline results for three foundational decoding tasks: speech detection, phoneme classification, and word classification. Baseline experiments demonstrate that increasing training data yields substantial improvements in decoding performance, highlighting the value of scaling up deep, within-subject datasets. By releasing this dataset, we aim to empower the research community to advance speech decoding methodologies and accelerate the development of safe, effective clinical brain-computer interfaces.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Jun-4-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.67)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Health & Medicine
  - Therapeutic Area > Neurology (1.00)
  - Health Care Technology (1.00)
  - Diagnostic Medicine (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Speech (1.00)
  - Natural Language (1.00)
  - Cognitive Science > Neuroscience (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Performance Analysis > Accuracy (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found