Distilling LLMs' Decomposition Abilities into Compact Language Models
Tarasov, Denis, Shridhar, Kumar
–arXiv.org Artificial Intelligence
Large Language Models (LLMs) have demonstrated proficiency in their reasoning abilities, yet their large size presents scalability challenges and limits any further customization. In contrast, compact models offer customized training but often fall short in solving complex reasoning tasks. This study focuses on distilling the LLMs' decomposition skills into compact models using offline reinforcement learning. We leverage the advancements in the LLM's capabilities to provide feedback and generate a specialized task-specific dataset for training compact models. These models not only excel at straightforward tasks such as summarization and sentiment analysis but, with adept prompting, demonstrate proficiency in handling reasoning tasks that demand mathematical and logical abilities (Huang & Chang, 2022). Notably, Chain-of-Thoughts (CoT) prompting (Wei et al., 2022) and its variations (Kojima et al., 2022; Wang et al., 2022) have proven to be promising and relatively simple techniques for enhancing LLMs' reasoning capabilities. Within the realm of complex reasoning, the ability to decompose intricate questions into a set of simpler sub-questions represents a crucial and understudied component (Shridhar et al., 2022). While existing works predominantly focus on end-to-end solutions for reasoning (Zhou et al., 2022; Lyu et al., 2023), the specific aspect of breaking down complex questions into simpler components has received limited attention. The creation of specialized datasets and benchmarks is integral to advancing the field of Deep Learning (Guss et al., 2019; Vinyals et al., 2019; Fu et al., 2020; Kurenkov et al., 2023). This work addresses the gap in understanding and exploration of the reasoning subquestioning process by providing a dataset and baselines for further research in this direction. Compounding the challenge is the computational overhead associated with large model sizes, making reasoning tasks computationally expensive and time-consuming when tuning models. Concurrently, approaches similar to Chain-of-Thoughts (CoT) may incur expenses, given that models with superior reasoning abilities are not available for free. In response, distilling distinct components of the reasoning process into smaller models emerges as a promising avenue for research.
arXiv.org Artificial Intelligence
Feb-2-2024