Rational Metareasoning for Large Language Models
De Sabbata, C. Nicolò, Sumers, Theodore R., Griffiths, Thomas L.
–arXiv.org Artificial Intelligence
Being prompted to engage in reasoning has emerged as a core technique for using large language models (LLMs), deploying additional inference-time compute to improve task performance. However, as LLMs increase in both size and adoption, inference costs are correspondingly becoming increasingly burdensome. This work introduces a novel approach based on computational models of metareasoning used in cognitive science, training LLMs to selectively use intermediate reasoning steps only when necessary. We first develop a reward function that incorporates the Value of Computation by penalizing unnecessary reasoning, then use this reward function with Expert Iteration to train the LLM. Compared to few-shot chain-of-thought prompting and STaR, our method significantly reduces inference costs (20-37% fewer tokens generated across three models) while maintaining task performance across diverse datasets. Large language models (LLMs) rely on substantial computational power to handle complex problems (OpenAI et al., 2024; Chowdhery et al., 2022; de Vries, 2023). While initial studies mostly focused on the cost of training (Verdecchia et al., 2023), LLMs' widespread deployment has made inference-time costs an increasingly important factor. However, there is a fundamental tension between inference cost and task performance: while many of these methods reduce costs at the expense of performance, others, such as chain-of-thought prompting (CoT; Wei et al., 2023; Kojima et al., 2023), do the opposite, raising inference costs to enhance task performance (Snell et al., 2024). It is worth noting that none of the previous approaches are adaptive: model compression modifications and existing CoT methods tend to raise or lower the inference cost on all queries, regardless of task complexity.
arXiv.org Artificial Intelligence
Dec-21-2024
- Country:
- Asia > Middle East
- Jordan (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- United States (0.04)
- Canada > Ontario
- Asia > Middle East
- Genre:
- Research Report > Promising Solution (0.48)
- Industry:
- Education > Curriculum > Subject-Specific Education (0.68)
- Technology: