ttt
TTT++: When Does Self-Supervised Test-Time Training Fail or Thrive?
Test-time training (TTT) through self-supervised learning (SSL) is an emerging paradigm to tackle distributional shifts. Despite encouraging results, it remains unclear when this approach thrives or fails. In this work, we first provide an in-depth look at its limitations and show that TTT can possibly deteriorate, instead of improving, the test-time performance in the presence of severe distribution shifts. To address this issue, we introduce a test-time feature alignment strategy utilizing offline feature summarization and online moment matching, which regularizes adaptation without revisiting training data. We further scale this strategy in the online setting through batch-queue decoupling to enable robust moment estimates even with limited batch size. Given aligned feature distributions, we then shed light on the strong potential of TTT by theoretically analyzing its performance post adaptation.
Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models
Hübotter, Jonas, Wolf, Patrik, Shevchenko, Alexander, Jüni, Dennis, Krause, Andreas, Kur, Gil
Many standard TTT methods train on carefully selected data from the pre-training dataset (i.e., do not add any new privileged information; Hardt & Sun, 2024; Hübotter et al., 2025), and several works studied how to optimally select data for imitation, e.g., the early seminal work of MacKay (1992) and recent extensions (Hübotter et al., 2024; Bagatella et al., 2025b). TTT has also been extended from supervised learning to reinforcement learning (Zuo et al., 2025; Bagatella et al., 2025a; Diaz-Bone et al., 2025). So far it has not been well understood why and when TTT is effective. While many different methods have been proposed for TTT, we focus here on analyzing "semi-parametric" TTT (e.g., Hardt & Sun, 2024; Hübotter et al., 2025), where a pre-trained model is fine-tuned with a supervised loss on a small neighborhood of the test point in the training data. This is different from some other methods for test-time "adaptation", which are commonly applied with distribution shifts (e.g., Wang et al., 2021; Zhang et al., 2022; Durasov et al., 2025). Basu et al. (2023) consider a similar setting to ours, but analyze it through the lens of non-parametric estimation, relying on the smoothness of the target function in the feature space Ψ.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- (2 more...)
Few-shot Protein Fitness Prediction via In-context Learning and Test-time Training
Teufel, Felix, Kollasch, Aaron W., Huang, Yining, Winther, Ole, Yang, Kevin K., Notin, Pascal, Marks, Debora S.
Accurately predicting protein fitness with minimal experimental data is a persistent challenge in protein engineering. We introduce PRIMO (PRotein In-context Mutation Oracle), a transformer-based framework that leverages in-context learning and test-time training to adapt rapidly to new proteins and assays without large task-specific datasets. By encoding sequence information, auxiliary zero-shot predictions, and sparse experimental labels from many assays as a unified token set in a pre-training masked-language modeling paradigm, PRIMO learns to prioritize promising variants through a preference-based loss function. Across diverse protein families and properties-including both substitution and indel mutations-PRIMO outperforms zero-shot and fully supervised baselines. This work underscores the power of combining large-scale pre-training with efficient test-time adaptation to tackle challenging protein design tasks where data collection is expensive and label availability is limited.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Alberta > Census Division No. 13 > Woodlands County (0.04)
- (2 more...)
- Europe > Netherlands > South Holland > Delft (0.04)
- Europe > Czechia > Prague (0.04)
Context Tuning for In-Context Optimization
Lu, Jack, Teehan, Ryan, Yang, Zhenbang, Ren, Mengye
We introduce Context Tuning, a simple and effective method to significantly enhance few-shot adaptation of language models (LLMs) without fine-tuning model parameters. While prompt-based adaptation techniques have demonstrated the effectiveness of lightweight adaptation methods for LLMs, they typically initialize a trainable prompt or prefix with irrelevant tokens for the task at hand. In contrast, Context Tuning initializes the trainable prompt or prefix with task-specific demonstration examples, leveraging the model's inherent In-Context Learning (ICL) ability to extract relevant information for improved few-shot learning performance. Extensive evaluations on benchmarks such as CrossFit, UnifiedQA, MMLU, BIG-Bench Hard, and ARC demonstrate that Context Tuning outperforms traditional prompt-based adaptation methods and achieves competitive accuracy to Test-Time Training with significantly higher training efficiency.
- Europe > Netherlands > South Holland > Delft (0.04)
- Europe > Czechia > Prague (0.04)
NeuroTTT: Bridging Pretraining-Downstream Task Misalignment in EEG Foundation Models via Test-Time Training
Wang, Suli, Deng, Yangshen, Bao, Zhenghua, Zhan, Xinyu, Duan, Yiqun
Large-scale foundation models for EEG signals offer a promising path to gener-alizable brain-computer interface (BCI) applications, but they often suffer from misalignment between pretraining objectives and downstream tasks, as well as significant fine-tuning and test-time distribution shifts. We introduce NeuroTTT, a two-stage alignment strategy that bridges the gap between generic pretraining and task-specific EEG decoding tasks. First, we perform a domain-specific self-supervised fine-tuning paradigm that augments the foundation model with task-relevant self-supervised objectives, aligning latent representations to important spectral, spatial, and temporal EEG features without requiring additional labeled data. Second, we incorporate test-time training (TTT) at inference, we apply (i) self-supervised test-time training on individual unlabeled test samples and (ii) prediction entropy minimization (Tent), which updates only normalization statistics to continually calibrate the model to each new input on the fly. Our approach, which, to our knowledge, is the first to unify domain-tuned self-supervision with test-time training in large-scale EEG foundation models, yields substantially improved robustness and accuracy across diverse BCI tasks (imagined speech, stress detection, motor imagery). Using CBraMod and LaBraM as backbones, our method pushes their performance to a markedly higher level. Results on three diverse tasks demonstrate that the proposed alignment strategy achieves state-of-the-art performance, outperforming conventional fine-tuning and adaptation methods. Electroencephalography (EEG) is a non-invasive technique for measuring brain electrical activity and underpins a variety of brain-computer interface (BCI) applications including but not limited to imagined speech decoding (Proix et al., 2022), mental stress detection (Badr et al., 2024), emotion recognition (Li et al., 2022) and motor imagery classification (Altaheri et al., 2023). Early EEG decoding approaches heavily relied on handcrafted features and traditional machine learning (Ramoser et al., 2000; Ang et al., 2008). In recent years, deep learning models have achieved superior performance by learning directly from raw EEG signals in an end-to-end fashion (Craik et al., 2019; Al-Saegh et al., 2021). However, most of deep learning models employ supervised learning methods tailored for specific tasks or datasets, and lack generalization ability. Inspired by the success of foundation models in natural language processing (NLP) and computer vision (CV) (Devlin et al., 2019; Liu et al., 2024), researchers begin developing large-scale EEG foundation models - pretrained foundation models intended to serve as general feature extractors for diverse EEG tasks (Lai et al., 2025).
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.35)
Test time training enhances in-context learning of nonlinear functions
Kuwataka, Kento, Suzuki, Taiji
Test-time training (TTT) enhances model performance by explicitly updating designated parameters prior to each prediction to adapt to the test data. While TTT has demonstrated considerable empirical success, its theoretical underpinnings remain limited, particularly for nonlinear models. In this paper, we investigate the combination of TTT with in-context learning (ICL), where the model is given a few examples from the target distribution at inference time. We analyze this framework in the setting of single-index models $y=σ_*(\langle β, \mathbf{x} \rangle)$, where the feature vector $β$ is drawn from a hidden low-dimensional subspace. For single-layer transformers trained with gradient-based algorithms and adopting TTT, we establish an upper bound on the prediction risk. Our theory reveals that TTT enables the single-layer transformers to adapt to both the feature vector $β$ and the link function $σ_*$, which vary across tasks. This creates a sharp contrast with ICL alone, which is theoretically difficult to adapt to shifts in the link function. Moreover, we provide the convergence rate with respect to the data length, showing the predictive error can be driven arbitrarily close to the noise level as the context size and the network width grow.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Asia > Singapore (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
Uncertainty-aware Test-Time Training (UT$^3$) for Efficient On-the-fly Domain Adaptive Dense Regression
Deep neural networks (DNNs) are increasingly being used in autonomous systems. However, DNNs do not generalize well to domain shift. Adapting to a continuously evolving environment is a safety-critical challenge inevitably faced by all autonomous systems deployed to the real world. Recent work on test-time training proposes methods that adapt to a new test distribution on the fly by optimizing the DNN model for each test input using self-supervision. However, these techniques result in a sharp increase in inference time as multiple forward and backward passes are required for a single test sample (for test-time training) before finally making the prediction based on the fine-tuned features. This is undesirable for real-world robotics applications where these models may be deployed to resource constraint hardware with strong latency requirements. In this work, we propose a new framework (called UT$^3$) that leverages test-time training for improved performance in the presence of continuous domain shift while also decreasing the inference time, making it suitable for real-world applications. Our method proposes an uncertainty-aware self-supervision task for efficient test-time training that leverages the quantified uncertainty to selectively apply the training leading to sharp improvements in the inference time while performing comparably to standard test-time training protocol. Our proposed protocol offers a continuous setting to identify the selected keyframes, allowing the end-user to control how often to apply test-time training. We demonstrate the efficacy of our method on a dense regression task - monocular depth estimation.
- Europe > Netherlands > South Holland > Delft (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)