Few-shot Protein Fitness Prediction via In-context Learning and Test-time Training
Teufel, Felix, Kollasch, Aaron W., Huang, Yining, Winther, Ole, Yang, Kevin K., Notin, Pascal, Marks, Debora S.
–arXiv.org Artificial Intelligence
Accurately predicting protein fitness with minimal experimental data is a persistent challenge in protein engineering. We introduce PRIMO (PRotein In-context Mutation Oracle), a transformer-based framework that leverages in-context learning and test-time training to adapt rapidly to new proteins and assays without large task-specific datasets. By encoding sequence information, auxiliary zero-shot predictions, and sparse experimental labels from many assays as a unified token set in a pre-training masked-language modeling paradigm, PRIMO learns to prioritize promising variants through a preference-based loss function. Across diverse protein families and properties-including both substitution and indel mutations-PRIMO outperforms zero-shot and fully supervised baselines. This work underscores the power of combining large-scale pre-training with efficient test-time adaptation to tackle challenging protein design tasks where data collection is expensive and label availability is limited.
arXiv.org Artificial Intelligence
Dec-3-2025
- Country:
- Europe
- Denmark > Capital Region
- Copenhagen (0.04)
- France (0.04)
- Denmark > Capital Region
- North America
- Canada > Alberta
- Census Division No. 13 > Woodlands County (0.04)
- United States > Massachusetts
- Middlesex County > Cambridge (0.04)
- Canada > Alberta
- South America > Chile
- Europe
- Genre:
- Research Report (0.81)
- Industry:
- Technology: