Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning

Ruan, Zhiwen, Li, Yixia, Zhu, He, Chen, Yun, Li, Peng, Liu, Yang, Chen, Guanhua

Oct-14-2025–arXiv.org Artificial Intelligence

Large language models (LLMs) primarily rely on supervised fine-tuning (SFT) as a key method to adapt pre-trained models to domain-specific tasks such as mathematical reasoning. However, standard SFT uniformly penalizes all tokens, neglecting that only a small subset of critical tokens determines reasoning correctness. This uniform supervision often causes reduced output diversity and limited generalization. We propose Critical T oken Fine-tuning (CFT), a simple yet effective approach that updates only tokens identified as functionally indispensable via counterfactual perturbations. By focusing gradient signals on these decisive reasoning steps while preserving the diversity of non-critical tokens, CFT can enhance both generation and diversity. Extensive experiments on five models across three families (Qwen, OLMo, LLaMA) and eleven mathematical reasoning benchmarks show that CFT, despite fine-tuning on less than 12% of tokens, consistently outperforms standard SFT. Moreover, CFT enables test-time scaling through improved sampling diversity and provides a stronger initialization for reinforcement learning, sustaining performance gains in later training stages while maintaining higher entropy for better exploration. Large language models (LLMs) have achieved remarkable progress across a wide range of complex tasks, driven by the rapid scaling of both model parameters and training data (Fedus et al., 2022; Achiam et al., 2023; AI@Meta, 2024; Team, 2024; Brown et al., 2020). To adapt these general-purpose models to specialized downstream tasks (Y u et al., 2024), the prevailing paradigm is supervised fine-tuning (SFT) (Sanh et al.; Ruan et al., 2025), which optimizes on labeled prompt-response pairs using a maximum likelihood objective (Ouyang et al., 2022). SFT can also serve as an initialization for reinforcement learning (RL), providing a strong starting point that aids further RL optimization (Chu et al., 2025; Li et al., 2025).

large language model, machine learning, qwen2, (18 more...)

arXiv.org Artificial Intelligence

Oct-14-2025

arXiv.org PDF

Add feedback

Country:
- North America > Mexico (0.28)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine (0.74)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found