Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models
Hübotter, Jonas, Wolf, Patrik, Shevchenko, Alexander, Jüni, Dennis, Krause, Andreas, Kur, Gil
–arXiv.org Artificial Intelligence
Many standard TTT methods train on carefully selected data from the pre-training dataset (i.e., do not add any new privileged information; Hardt & Sun, 2024; Hübotter et al., 2025), and several works studied how to optimally select data for imitation, e.g., the early seminal work of MacKay (1992) and recent extensions (Hübotter et al., 2024; Bagatella et al., 2025b). TTT has also been extended from supervised learning to reinforcement learning (Zuo et al., 2025; Bagatella et al., 2025a; Diaz-Bone et al., 2025). So far it has not been well understood why and when TTT is effective. While many different methods have been proposed for TTT, we focus here on analyzing "semi-parametric" TTT (e.g., Hardt & Sun, 2024; Hübotter et al., 2025), where a pre-trained model is fine-tuned with a supervised loss on a small neighborhood of the test point in the training data. This is different from some other methods for test-time "adaptation", which are commonly applied with distribution shifts (e.g., Wang et al., 2021; Zhang et al., 2022; Durasov et al., 2025). Basu et al. (2023) consider a similar setting to ours, but analyze it through the lens of non-parametric estimation, relying on the smoothness of the target function in the feature space Ψ.
arXiv.org Artificial Intelligence
Dec-12-2025
- Country:
- Africa > Middle East
- Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- Asia > Middle East
- Jordan (0.04)
- Europe
- Germany > Baden-Württemberg
- Tübingen Region > Tübingen (0.04)
- Switzerland > Zürich
- Zürich (0.14)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Germany > Baden-Württemberg
- Africa > Middle East
- Genre:
- Research Report > New Finding (0.93)
- Technology: