Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step

Li, Liunian Harold, Hessel, Jack, Yu, Youngjae, Ren, Xiang, Chang, Kai-Wei, Choi, Yejin

Jun-24-2023–arXiv.org Artificial Intelligence

Chain-of-thought prompting (e.g., "Let's think step-by-step") primes large language models to verbalize rationalization for their predictions. While chain-of-thought can lead to dramatic performance gains, benefits appear to emerge only for sufficiently large models (beyond 50B parameters). We show that orders-of-magnitude smaller models (125M -- 1.3B parameters) can still benefit from chain-of-thought prompting. To achieve this, we introduce Symbolic Chain-of-Thought Distillation (SCoTD), a method to train a smaller student model on rationalizations sampled from a significantly larger teacher model. Experiments across several commonsense benchmarks show that: 1) SCoTD enhances the performance of the student model in both supervised and few-shot settings, and especially for challenge sets; 2) sampling many reasoning chains per instance from the teacher is paramount; and 3) after distillation, student chain-of-thoughts are judged by humans as comparable to the teacher, despite orders of magnitude fewer parameters. We test several hypotheses regarding what properties of chain-of-thought samples are important, e.g., diversity vs. teacher likelihood vs. open-endedness. We release our corpus of chain-of-thought samples and code.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jun-24-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > Monroe County
    - Rochester (0.04)
  - California > Los Angeles County
    - Los Angeles (0.14)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.82)

Industry:
- Education (1.00)
- Leisure & Entertainment (0.93)
- Media > Film (0.46)
- Law > Statutes (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.69)
  - Machine Learning
    - Neural Networks > Deep Learning (0.47)
    - Inductive Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found