Distilling LLMs' Decomposition Abilities into Compact Language Models

Feb-2-2024–arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated proficiency in their reasoning abilities, yet their large size presents scalability challenges and limits any further customization. In contrast, compact models offer customized training but often fall short in solving complex reasoning tasks. This study focuses on distilling the LLMs' decomposition skills into compact models using offline reinforcement learning. We leverage the advancements in the LLM's capabilities to provide feedback and generate a specialized task-specific dataset for training compact models. These models not only excel at straightforward tasks such as summarization and sentiment analysis but, with adept prompting, demonstrate proficiency in handling reasoning tasks that demand mathematical and logical abilities (Huang & Chang, 2022). Notably, Chain-of-Thoughts (CoT) prompting (Wei et al., 2022) and its variations (Kojima et al., 2022; Wang et al., 2022) have proven to be promising and relatively simple techniques for enhancing LLMs' reasoning capabilities. Within the realm of complex reasoning, the ability to decompose intricate questions into a set of simpler sub-questions represents a crucial and understudied component (Shridhar et al., 2022). While existing works predominantly focus on end-to-end solutions for reasoning (Zhou et al., 2022; Lyu et al., 2023), the specific aspect of breaking down complex questions into simpler components has received limited attention. The creation of specialized datasets and benchmarks is integral to advancing the field of Deep Learning (Guss et al., 2019; Vinyals et al., 2019; Fu et al., 2020; Kurenkov et al., 2023). This work addresses the gap in understanding and exploration of the reasoning subquestioning process by providing a dataset and baselines for further research in this direction. Compounding the challenge is the computational overhead associated with large model sizes, making reasoning tasks computationally expensive and time-consuming when tuning models. Concurrently, approaches similar to Chain-of-Thoughts (CoT) may incur expenses, given that models with superior reasoning abilities are not available for free. In response, distilling distinct components of the reasoning process into smaller models emerges as a promising avenue for research.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Feb-2-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.46)
    - Neural Networks > Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)