Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence

McLeish, Sean, Li, Ang, Kirchenbauer, John, Kalra, Dayal Singh, Bartoldson, Brian R., Kailkhura, Bhavya, Schwarzschild, Avi, Geiping, Jonas, Goldstein, Tom, Goldblum, Micah

Nov-11-2025–arXiv.org Artificial Intelligence

Recent advances in depth-recurrent language models show that recurrence can decouple train-time compute and parameter count from test-time compute. In this work, we study how to convert existing pretrained non-recurrent language models into depth-recurrent models. We find that using a curriculum of recurrences to increase the effective depth of the model over the course of training preserves performance while reducing total computational cost. In our experiments, on mathematics, we observe that converting pretrained models to recurrent ones results in better performance at a given compute budget than simply post-training the original non-recurrent language model.

accuracy, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Nov-11-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (1.00)

Genre:
- Research Report > New Finding (0.48)

Industry:
- Energy (0.68)
- Education (0.67)
- Government > Regional Government
  - North America Government > United States Government (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.70)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found