Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning

Ye, Ziang, Zhang, Zhenru, Zhang, Yang, Ma, Jianxin, Lin, Junyang, Feng, Fuli

Dec-19-2024–arXiv.org Artificial Intelligence

When using agent-task datasets to enhance agent capabilities for Large Language Models (LLMs), current methodologies often treat all tokens within a sample equally. However, we argue that tokens serving different roles - specifically, reasoning tokens versus boilerplate tokens (e.g., those governing output format) - differ significantly in importance and learning complexity, necessitating their disentanglement and distinct treatment. To address this, we propose a novel Shuffle-Aware Discriminator (SHAD) for adaptive token discrimination. SHAD classifies tokens by exploiting predictability differences observed after shuffling input-output combinations across samples: boilerplate tokens, due to their repetitive nature among samples, maintain predictability, whereas reasoning tokens do not. Using SHAD, we propose the Reasoning-highlighted Fine-Tuning (RFT) method, which adaptively emphasizes reasoning tokens during fine-tuning, yielding notable performance gains over common Supervised Fine-Tuning (SFT).

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

Dec-19-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.68)
- North America > United States (0.68)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)