OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning

Zhang, Yuxiang, Yang, Yuqi, Shu, Jiangming, Wang, Yuhang, Xiao, Jinlin, Sang, Jitao

Dec-21-2024–arXiv.org Artificial Intelligence

OpenAI's recent introduction of Reinforcement Fine-Tuning (RFT) showcases the potential of reasoning foundation model and offers a new paradigm for fine-tuning beyond simple pattern imitation. This technical report presents OpenRFT, our attempt to fine-tune generalist reasoning models for domain-specific tasks under the same settings as RFT. The evaluation is conducted on Sci-KnowEval, where OpenRFT achieves notable performance gains with only 100 domain-specific samples for each task. More experimental results will be updated continuously in later versions. OpenAI's o1 model has shown strong reasoning abilities in mathematics and programming, but its generalization to other tasks remains uncertain. The recent introduction of Reinforcement Fine-Tuning (RFT) (OpenAI, 2024) has provided a promising avenue for reasoning generalization. With only dozens of high-quality (question, answer) pairs, RFT enables the creation of customized reasoning models excelling at domain-specific tasks. The significance of RFT is at least two-fold: (1) It demonstrates the promise of using generalist reasoning models, like o1, as reasoning foundation models. By enabling the efficient creation of domain-specific reasoning models, RFT practically expands the applicability of reasoning models across diverse tasks. Unlike Supervised Fine-Tuning (SFT), which merely mimics patterns in training data, RFT leverages reasoning capabilities to facilitate thinking and trial-and-error learning.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Dec-21-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.82)

Industry:
- Education (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.65)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found