SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models

Yu, Dian, Peng, Baolin, Tian, Ye, Song, Linfeng, Mi, Haitao, Yu, Dong

Aug-28-2024–arXiv.org Artificial Intelligence

There is a growing trend of teaching large language models (LLMs) to solve mathematical problems through coding. Existing studies primarily focus on prompting powerful, closed-source models to generate seed training data followed by in-domain data augmentation, equipping LLMs with considerable capabilities for code-aided mathematical reasoning. However, continually training these models on augmented data derived from a few datasets such as GSM8K may impair their generalization abilities and restrict their effectiveness to a narrow range of question types. Conversely, the potential of improving such LLMs by leveraging large-scale, expert-written, diverse math question-answer pairs remains unexplored. To utilize these resources and tackle unique challenges such as code response assessment, we propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. We also explore different alignment algorithms with self-generated instruction/preference data to foster continuous improvement. Experiments across both in-domain (up to +5.7%) and out-of-domain (+4.4%) benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.

arxiv preprint arxiv, critic model, reference answer, (15 more...)

arXiv.org Artificial Intelligence

Aug-28-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Washington > King County > Bellevue (0.04)
- Asia > China
  - Guangxi Province > Nanning (0.04)

Genre:
- Research Report > New Finding (0.67)

Industry:
- Education
  - Educational Setting > K-12 Education (0.46)
  - Curriculum > Subject-Specific Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.71)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found