Distilling Mathematical Reasoning Capabilities into Small Language Models
Zhu, Xunyu, Li, Jian, Liu, Yong, Ma, Can, Wang, Weiping
–arXiv.org Artificial Intelligence
This work addresses the challenge of democratizing advanced Large Language Models (LLMs) by compressing their mathematical reasoning capabilities into sub-billion parameter Small Language Models (SLMs) without compromising performance. We introduce Equation-of-Thought Distillation (EoTD), a novel technique that encapsulates the reasoning process into equation-based representations to construct an EoTD dataset for fine-tuning SLMs. Additionally, we propose the Ensemble Thoughts Distillation (ETD) framework to enhance the reasoning performance of SLMs. This involves creating a reasoning dataset with multiple thought processes, including Chain-of-Thought (CoT), Program-of-Thought (PoT), and Equation-of-Thought (EoT), and using it for fine-tuning. Our experimental findings demonstrate that EoTD significantly boosts the reasoning abilities of SLMs, while ETD enables these models to achieve state-of-the-art reasoning performance.
arXiv.org Artificial Intelligence
Feb-1-2024
- Country:
- Asia > Middle East
- UAE (0.14)
- North America > United States
- Hawaii (0.14)
- Asia > Middle East
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Information Technology (0.46)
- Technology: