NeuroTTT: Bridging Pretraining-Downstream Task Misalignment in EEG Foundation Models via Test-Time Training

Wang, Suli, Deng, Yangshen, Bao, Zhenghua, Zhan, Xinyu, Duan, Yiqun

arXiv.org Artificial Intelligence 

Large-scale foundation models for EEG signals offer a promising path to gener-alizable brain-computer interface (BCI) applications, but they often suffer from misalignment between pretraining objectives and downstream tasks, as well as significant fine-tuning and test-time distribution shifts. We introduce NeuroTTT, a two-stage alignment strategy that bridges the gap between generic pretraining and task-specific EEG decoding tasks. First, we perform a domain-specific self-supervised fine-tuning paradigm that augments the foundation model with task-relevant self-supervised objectives, aligning latent representations to important spectral, spatial, and temporal EEG features without requiring additional labeled data. Second, we incorporate test-time training (TTT) at inference, we apply (i) self-supervised test-time training on individual unlabeled test samples and (ii) prediction entropy minimization (Tent), which updates only normalization statistics to continually calibrate the model to each new input on the fly. Our approach, which, to our knowledge, is the first to unify domain-tuned self-supervision with test-time training in large-scale EEG foundation models, yields substantially improved robustness and accuracy across diverse BCI tasks (imagined speech, stress detection, motor imagery). Using CBraMod and LaBraM as backbones, our method pushes their performance to a markedly higher level. Results on three diverse tasks demonstrate that the proposed alignment strategy achieves state-of-the-art performance, outperforming conventional fine-tuning and adaptation methods. Electroencephalography (EEG) is a non-invasive technique for measuring brain electrical activity and underpins a variety of brain-computer interface (BCI) applications including but not limited to imagined speech decoding (Proix et al., 2022), mental stress detection (Badr et al., 2024), emotion recognition (Li et al., 2022) and motor imagery classification (Altaheri et al., 2023). Early EEG decoding approaches heavily relied on handcrafted features and traditional machine learning (Ramoser et al., 2000; Ang et al., 2008). In recent years, deep learning models have achieved superior performance by learning directly from raw EEG signals in an end-to-end fashion (Craik et al., 2019; Al-Saegh et al., 2021). However, most of deep learning models employ supervised learning methods tailored for specific tasks or datasets, and lack generalization ability. Inspired by the success of foundation models in natural language processing (NLP) and computer vision (CV) (Devlin et al., 2019; Liu et al., 2024), researchers begin developing large-scale EEG foundation models - pretrained foundation models intended to serve as general feature extractors for diverse EEG tasks (Lai et al., 2025).