Timber: Training-free Instruct Model Refining with Base via Effective Rank

Wu, Taiqiang, Yang, Runming, Liu, Tao, Wang, Jiahao, Xu, Zenan, Wong, Ngai

arXiv.org Artificial Intelligence 

Post-training, which elicits a pretrained Base model into the corresponding Instruct model, is widely considered to be superficial. In this work, we first reinforce this hypothesis by providing novel quantitative evidence from the weight level that the effective rank (eRank) remains negligibly changed. However, this superficiality also suffers a critical trade-off, improving the exploitation capabilities at the cost of limiting its exploration. To tackle this issue, we propose Timber, a simple yet effective training-free method that enhances the exploration capability of the Instruct model while preserving its exploitation. The key insight is to partially revert Instruct towards the paired Base model by subtle yet targeted refinement of the weight deltas. Extensive experiments on Llama and Qwen series demonstrate that Timber consistently improves vanilla Instruct models, particularly on Pass@k performance. Our findings offer new insights into the post-training stage at the weight level and practical strategies to refine the Instruct model without training. Large Language Models (LLMs), such as Qwen3 (Y ang et al., 2025), Llama 3 (Grattafiori et al., 2024), and Deepseek R1 (Guo et al., 2025), have achieved superior success in Natural Language Process (NLP), especially in reasoning tasks (Huang & Chang, 2022). To train these LLMs, a Base model is first pretrained on huge amounts of data. After that, a post-training stage is applied to train an Instruct model, adapting supervised finetuning (SFT) and reinforcement learning (RL) to elicit alignment and reasoning ability (Y ang et al., 2025). The post-training stage tends to be superficial, i.e., post-training only utilizes the pattern contained in the Base model acquired during pre-training (Y ue et al., 2025; Zhou et al., 2023a; Y e et al., 2025; Muennighoff et al., 2025). In this paper, we investigate the Base and Instruct models through the lens of effective rank (eRank, (Roy & V etterli, 2007)), providing a novel weight-level perspective on the superficiality of post-training. As shown in Figure 1, the eRanks of corresponding linear layers from the Base and Instruct models are almost identical. We can find that post-training induces only negligible changes to the effective dimensionality, offering new supporting evidence from the weight level for its superficiality.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found